high performance computing on graphics processing units: hgpu.org

Posts

Jan, 24

CUDASA: Compute Unified Device and Systems Architecture

We present an extension to the CUDA programming language which extends parallelism to multi-GPU systems and GPU-cluster environments. Following the existing model, which exposes the internal parallelism of GPUs, our extended programming language provides a consistent development interface for additional, higher levels of parallel abstraction from the bus and network interconnects. The newly introduced layers […]

CUDA

Jan, 24

Sparse regularization in MRI iterative reconstruction using GPUs

Regularization is a common technique used to improve image quality in inverse problems such as MR image reconstruction. In this work, we extend our previous Graphics Processing Unit (GPU) implementation of MR image reconstruction with compensation for susceptibility-induced field inhomogeneity effects by incorporating an additional quadratic regularization term. Regularization techniques commonly impose the prior information […]

CUDA

Jan, 24

Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architectures

Reduction is a common component of many applications, but can often be the limiting factor for parallelization. Previous reduction work has focused on detecting reduction idioms and parallelizing the reduction operation by minimizing data communications or exploiting more data locality. While these techniques can be useful, they are mostly limited to simple code structures. In […]

CUDA

Jan, 24

Multi-GPU Implementation for Iterative MR Image Reconstruction with Field Correction

Many advanced MRI image acquisition and reconstruction methods see limited application due to high computational cost in MRI. For instance, iterative reconstruction algorithms (e.g. non-Cartesian k-space trajectory, or magnetic field inhomogeneity compensation) can improve image quality but suffer from low reconstruction speed. General-purpose computing on graphics processing units (GPU) have demonstrated significant performance speedups and […]

CUDA

Jan, 24

Accelerating iterative field-compensated MR image reconstruction on GPUs

We propose a fast implementation for iterative MR image reconstruction using Graphics Processing Units (GPU). In MRI, iterative reconstruction with conjugate gradient algorithms allows for accurate modeling the physics of the imaging system. Specifically, methods have been reported to compensate for the magnetic field inhomogeneity induced by the susceptibility differences near the air/tissue interface in […]

CUDA

Jan, 24

Data Layout Transformation for Structured-Grid Codes on GPU

We present data layout transformation as an effective performance optimization for memory-bound structuredgrid applications for GPUs. Structured grid applications are a class of applications that compute grid cell values on a regular 2D, 3D or higher dimensional regular grid. Each output point is computed as a function of itself and its nearest neighbors. Stencil code […]

CUDA

Jan, 24

Program Optimization Strategies for Data-Parallel Many-Core Processors

Program optimization for highly parallel systems has historically been considered an art, with experts doing much of the performance tuning by hand. With the introduction of inexpensive, single-chip, massively parallel platforms, more developers will be creating highly data-parallel applications for these platforms while lacking the substantial experience and knowledge needed to maximize application performance. In […]

CUDA

Jan, 24

Efficient Parallel Scan Algorithms for GPUs

Scan and segmented scan algorithms are crucial building blocks for a great many data-parallel algorithms. Segmented scan and related primitives also provide the necessary support for the flattening transform, which allows for nested data-parallel programs to be compiled into flat data-parallel languages. In this paper, we describe the design of efficient scan and segmented scan […]

CUDA

Jan, 24

Efficient Sparse Matrix-Vector Multiplication on CUDA

The massive parallelism of graphics processing units (GPUs) offers tremendous performance in many high-performance computing applications. While dense linear algebra readily maps to such platforms, harnessing this potential for sparse matrix computations presents additional challenges. Given its role in iterative methods for solving sparse linear systems and eigenvalue problems, sparse matrix-vector multiplication (SpMV) is of […]

CUDA

Jan, 23

Parallel Genetic Algorithms on Programmable Graphics Hardware

Parallel genetic algorithms are usually implemented on parallel machines or distributed systems. This paper describes how fine-grained parallel genetic algorithms can be mapped to programmable graphics hardware found in commodity PC. Our approach stores chromosomes and their fitness values in texture memory on graphics card. Both fitness evaluation and genetic operations are implemented entirely with […]

Jan, 23

Parallel Evolutionary Algorithms on Consumer-Level Graphics Processing Unit

Evolutionary Algorithms (EAs) are effective and robust methods for solving many practical problems such as feature selection, electrical circuits synthesis, and data mining. However, they may execute for a long time for some difficult problems, because several fitness evaluations must be performed. A promising approach to overcome this limitation is to parallelize these algorithms. In […]

Jan, 23

Parallel hybrid genetic algorithms on Consumer-Level graphics hardware

In this paper, we report a parallel hybrid genetic algorithm (HGA) on consumer-level graphics cards. HGA extends the classical genetic algorithm by incorporating the Cauchy mutation operator from evolutionary programming. In our parallel HGA, all steps except the random number generation procedure are performed in graphics processing unit (GPU) and thus our parallel HGA can […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

CUDASA: Compute Unified Device and Systems Architecture

Sparse regularization in MRI iterative reconstruction using GPUs

Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architectures

Multi-GPU Implementation for Iterative MR Image Reconstruction with Field Correction

Accelerating iterative field-compensated MR image reconstruction on GPUs

Data Layout Transformation for Structured-Grid Codes on GPU

Program Optimization Strategies for Data-Parallel Many-Core Processors

Efficient Parallel Scan Algorithms for GPUs

Efficient Sparse Matrix-Vector Multiplication on CUDA

Parallel Genetic Algorithms on Programmable Graphics Hardware

Parallel Evolutionary Algorithms on Consumer-Level Graphics Processing Unit

Parallel hybrid genetic algorithms on Consumer-Level graphics hardware

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)