high performance computing on graphics processing units: hgpu.org

Posts

Oct, 10

Computitional intensive Tasks in Multimedia Signal Processing

Driven by the gaming industry and the great emphasis placed on the visual sense, graphics processing units (GPUs) have improved their performances in recent years, even outperforming the computational capacity of single core CPUs. In fact multi-core architectures are nowadays common for both CPUs and GPUs in order to exploit parallelism in computing. In this […]

CUDA

Oct, 10

A GPU-Accelerated Parallel Preconditioner for the Solution of the Boltzmann Transport Equation for Semiconductors

The solution of large systems of linear equations is typically achieved by iterative methods. The rate of convergence of these methods can be substantially improved by the use of preconditioners, which can be either applied in a black-box fashion to the linear system, or exploit properties specific to the underlying problem for maximum efficiency. However, […]

OpenCL

Oct, 10

Anti-parallel Patterns in Fine-grain Data-parallel Programs

Parallel systems and parallel programming are becoming increasingly more important. The developer in want of raw speed can no longer expect sequential processors to become faster and needs to turn to parallel platforms and parallel programs to satisfy his need for speed. But writing a parallel program is difficult and writing one with a decent […]

CUDA

Oct, 10

Benchmarks Based on Anti-Parallel Patterns for the Evaluation of GPUs

We put forward "anti-parallel patterns" to guide the parallel performance analysis process. Anti-parallel patterns or APPs are common parts of parallel programs that cause these programs to have less than ideal performance, where the ideal speedup equals the number of processors. We present benchmarks to model the behavior of APPs on parallel platforms. Each benchmark […]

OpenCL

Oct, 10

Streaming-Oriented Parallelization of Domain-Independent Irregular Kernels

Current parallelizing and optimizing compilers use techniques for the recognition of computational kernels to improve the quality of the target code. Domain-independent kernels characterize the computations carried out in an application, independently of the implementation details of a given programming language. This paper presents streaming-oriented parallelizing transformations for irregular assignment and irregular reduction kernels. The […]

Oct, 10

Evaluation of GPU Architectures Using Spiking Neural Networks

During recent years General-Purpose Graphical Processing Units (GP-GPUs) have entered the field of High-Performance Computing (HPC) as one of the primary architectural focuses for many research groups working with complex scientific applications. Nvidia’s Tesla C2050, codenamed Fermi, and AMD’s Radeon 5870 are two devices positioned to meet the computationally demanding needs of supercomputing research groups […]

OpenCL

Oct, 10

Towards an Effective Unified Programming Model for Many-Cores

Building an effective programming model for many-core processors is challenging. On the one hand, the increasing variety of platforms and their specific programming models force users to take a hardware-centric approach not only for implementing parallel applications, but also for designing them. This approach diminishes portability and, eventually, limits performance. On the other hand, to […]

OpenCL

Oct, 10

Denoising Volumetric Data on GPU

Volumetric data is currently gradually being used more and more in everyday aspect of our lives. Processing such data is computationally expensive and until now more sophisticated algorithms could not be used. The possibilities of processing such data have considerably widened since the increase of parallel computational power in modern GPUs. We present a novel […]

OpenCL

Oct, 10

Data-Parallel Construction of delta_N-Nets with Maximum Dispersion

Linear nearest-neighbor search in high-dimensional data exposes high computational complexity. In order to minimize search complexity we employ optimal delta-nets of rank N, which consist of a small sub set of N vectors out of an initial code book E, yet approximate all En vectors of E by the least error of all possible selections […]

OpenCL

Oct, 10

GPUs, a New Tool of Acceleration in CFD: Efficiency and Reliability on Smoothed Particle Hydrodynamics Methods

Smoothed Particle Hydrodynamics (SPH) is a numerical method commonly used in Computational Fluid Dynamics (CFD) to simulate complex free-surface flows. Simulations with this mesh-free particle method far exceed the capacity of a single processor. In this paper, as part of a dual-functioning code for either central processing units (CPUs) or Graphics Processor Units (GPUs), a […]

CUDA

Oct, 9

Computer Vision Models in Surveillance Robotics

In this Thesis, we developed algorithms that use visual informations to automatically perform, in real time, detection, recognition and categorisation of moving objects independently on the environmental conditions and with the best accuracy. To this end, we developed upon several concepts of computer vision, namely the identiﬁcation of the objects of interest in the whole […]

CUDA

Oct, 9

Real time ultrasound image denoising

Image denoising is the process of removing the noise that perturbs image analysis methods. In some applications like segmentation or registration, denoising is intended to smooth homogeneous areas while preserving the contours. In many applications like video analysis, visual servoing or image-guided surgical interventions, real-time denoising is required. This paper presents a method for real-time […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Computitional intensive Tasks in Multimedia Signal Processing

A GPU-Accelerated Parallel Preconditioner for the Solution of the Boltzmann Transport Equation for Semiconductors

Anti-parallel Patterns in Fine-grain Data-parallel Programs

Benchmarks Based on Anti-Parallel Patterns for the Evaluation of GPUs

Streaming-Oriented Parallelization of Domain-Independent Irregular Kernels

Evaluation of GPU Architectures Using Spiking Neural Networks

Towards an Effective Unified Programming Model for Many-Cores

Denoising Volumetric Data on GPU

Data-Parallel Construction of delta_N-Nets with Maximum Dispersion

GPUs, a New Tool of Acceleration in CFD: Efficiency and Reliability on Smoothed Particle Hydrodynamics Methods

Computer Vision Models in Surveillance Robotics

Real time ultrasound image denoising

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)