high performance computing on graphics processing units: hgpu.org

Posts

Apr, 22

AMD Fusion Developer Summit 2011, AFDS 2011

Heterogeneous computing is moving into the mainstream, and a broader range of applications are already on the way. As the provider of world-class CPUs, GPUs, and APUs, AMD offers unique insight into these technologies and how they interoperate. Attend the AMD Fusion Developer Summit to learn about the opportunities that lie ahead.

OpenCL

Apr, 21

Pretty Good Accuracy in Matrix Multiplication with GPUs

With systems such as Road Runner, there is a trend in super computing to offload parallel tasks to special purpose co-processors, composed of many relatively simple scalar processors. The cheaper commodity class equivalent of such a processor would be the graphics card, potentially offering super computer power within the confines of a desktop PC. Graphics […]

CUDA

Apr, 21

Using graphics processors to accelerate the computation of the matrix inverse

We study the use of massively parallel architectures for computing a matrix inverse. Two different algorithms are reviewed, the traditional approach based on Gaussian elimination and the Gauss-Jordan elimination alternative, and several high performance implementations are presented and evaluated. The target architecture is a current general-purpose multicore processor (CPU) connected to a graphics processor (GPU). […]

Apr, 21

Fast Variable Center-Biased Windowing for High-Speed Stereo on Programmable Graphics Hardware

We present a high-speed dense stereo algorithm that achieves both good quality results and very high disparity estimation throughput on the graphics processing unit (GPU). The key idea is a variable center-biased windowing approach, enabling an adaptive selection of the most suitable support patterns with varying sizes and shapes. As the fundamental construct for variable […]

Apr, 21

Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation

We propose a system-independent representation of sparse matrix formats that allows a compiler to generate efficient, system-specific code for sparse matrix operations. To show the viability of such a representation we have developed a compiler that generates and tunes code for sparse matrix-vector multiplication (SpMV) on GPUs. We evaluate our framework on six state-of-the-art matrix […]

CUDA

•

OpenCL

Apr, 21

Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures

Graphics processors are increasingly used in scientific applications due to their high computational power, which comes from hardware with multiple-level parallelism and memory hierarchy. Sparse matrix computations frequently arise in scientific applications, for example, when solving PDEs on unstructured grids. However, traditional sparse matrix algorithms are difficult to efficiently parallelize for GPUs due to irregular […]

Apr, 21

Assessment of GPU computational enhancement to a 2D flood model

This paper presents a study of the computational enhancement of a Graphics Processing Unit (GPU) enabled 2D flood model. The objectives are to demonstrate the significant speedup of a new GPU-enabled full dynamic wave flood model and to present the effect of model spatial resolution on its speedup. A 2D dynamic flood model based on […]

CUDA

Apr, 21

Solving knapsack problems on GPU

A parallel implementation via CUDA of the dynamic programming method for the knapsack problem on NVIDIA GPU is presented. A GTX 260 card with 192 cores (1.4GHz) is used for computational tests and processing times obtained with the parallel code are compared to the sequential one on a CPU with an Intel Xeon 3.0GHz. The […]

CUDA

Apr, 21

Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs

Graphics Processing Unit (GPU) has become an attractive coprocessor for scientific computing due to its massive processing capability. The sparse matrix-vector multiplication (SpMV) is a critical operation in a wide variety of scientific and engineering applications, such as sparse linear algebra and image processing. This paper presents an auto-tuning framework that can automatically compute and […]

CUDA

Apr, 21

A performance prediction model for the CUDA GPGPU platform

The significant growth in computational power of modern Graphics Processing Units (GPUs) coupled with the advent of general purpose programming environments like NVIDIA’s CUDA, has seen GPUs emerging as a very popular parallel computing platform. Till recently, there has not been a performance model for GPGPUs. The absence of such a model makes it difficult […]

CUDA

Apr, 21

Fine-sorting One-dimensional Particle-In-Cell Algorithm with Monte-Carlo Collisions on a Graphics Processing Unit

Particle-in-cell (PIC) simulations with Monte-Carlo collisions are used in plasma science to explore a variety of kinetic effects. One major problem is the long run-time of such simulations. Even on modern computer systems, PIC codes take a considerable amount of time for convergence. Most of the computations can be massively parallelized, since particles behave independently […]

CUDA

Apr, 20

Exploring scalability of FIR filter realizations on Graphics Processing Units

General-Purpose Computing on Graphics Processing Units (GPGPU) has lately been of great interest due to the release of architectures and software that simplifies programming graphics cards. This study explores how performance scales with FIR digital filters by varying the number of taps and the samples. We also discuss the trade-offs with various techniques for GPGPU […]

CUDA

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Posts

AMD Fusion Developer Summit 2011, AFDS 2011

Pretty Good Accuracy in Matrix Multiplication with GPUs

Using graphics processors to accelerate the computation of the matrix inverse

Fast Variable Center-Biased Windowing for High-Speed Stereo on Programmable Graphics Hardware

Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation

Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures

Assessment of GPU computational enhancement to a 2D flood model

Solving knapsack problems on GPU

Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs

A performance prediction model for the CUDA GPGPU platform

Fine-sorting One-dimensional Particle-In-Cell Algorithm with Monte-Carlo Collisions on a Graphics Processing Unit

Exploring scalability of FIR filter realizations on Graphics Processing Units

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)