high performance computing on graphics processing units: hgpu.org

Posts

Dec, 18

Long time-scale simulations of in vivo diffusion using GPU hardware

To address the problem of performing long time simulations of biochemical pathways under in vivo cellular conditions, we have developed a lattice-based, reaction-diffusion model that uses the graphics processing unit (GPU) as a computational co-processor. The method has been specifically designed from the beginning to take advantage of the GPU’s capacity to perform massively parallel […]

CUDA

Dec, 18

Large-scale FFT on GPU clusters

A GPU cluster is a cluster equipped with GPU devices. Excellent acceleration is achievable for computation-intensive tasks (e. g. matrix multiplication and LINPACK) and bandwidth-intensive tasks with data locality (e. g. finite-difference simulation). Bandwidth-intensive tasks such as large-scale FFTs without data locality are harder to accelerate, as the bottleneck often lies with the PCI between […]

CUDA

Dec, 18

Shader Performance Analysis on a Modern GPU Architecture

This paper presents an analysis of the performance of the shader processing units in a modern graphics processor unit (GPU) architecture using real graphic applications. The architecture of a modern GPU is described and a simulator and associated framework used to evaluate the architecture is introduced. The paper analyses the effects in performance of different […]

OpenGL

Dec, 18

GPU clusters for high-performance computing

Large-scale GPU clusters are gaining popularity in the scientific computing community. However, their deployment and production use are associated with a number of new challenges. In this paper, we present our efforts to address some of the challenges with building and running GPU clusters in HPC environments. We touch upon such issues as balanced cluster […]

CUDA

Dec, 18

Accelerating Template-Based Matching on the GPU for AR Applications

Recently researchers have shown that it is possible to use GPU hardware for image processing and computer vision algorithms. We have been exploring how to use GPU hardware to improve marker-based tracking for AR Applications. In this paper we describe our findings and explored issues in the context of a standard fiducial tracking pipeline. We […]

Dec, 18

Accelerating SQL Database Operations on a GPU with CUDA

Prior work has shown dramatic acceleration for various database operations on GPUs, but only using primitives that are not part of conventional database languages such as SQL. This paper implements a subset of the SQLite command processor directly on the GPU. This dramatically reduces the effort required to achieve GPU acceleration by avoiding the need […]

CUDA

Dec, 18

Efficient, High-Quality Bayer Demosaic Filtering on GPUs

This paper describes a series of optimizations for implementing the high-quality Malvar-He-Cutler Bayer demosaicing filter on a GPU in OpenGL. Applying this filter is the first step in most video-processing pipelines but is generally considered too slow for real time on a CPU. The optimized implementation contains 66% fewer ALU operations than a direct GPU […]

OpenGL

Dec, 18

GPU-based Island Model for Evolutionary Algorithms

The island model for evolutionary algorithms allows to delay the global convergence of the evolution process and encourage diversity. However, solving large size and time-intensive combinatorial optimization problems with the island model requires a large amount of computational resources. GPU computing is recently revealed as a powerful way to harness these resources. In this paper, […]

CUDA

Dec, 18

Accelerating K-Means on the Graphics Processor via CUDA

In this paper an optimized k-means implementation on the graphics processing unit (GPU) is presented. NVIDIApsilas compute unified device architecture (CUDA), available from the G80 GPU family onwards, is used as the programming environment. Emphasis is placed on optimizations directly targeted at this architecture to best exploit the computational capabilities available. Additionally drawbacks and limitations […]

CUDA

Dec, 18

A GPU based implementation of Center-Surround Distribution Distance for feature extraction and matching

The release of general purpose GPU programming environments has garnered universal access to computing performance that was once only available to super-computers. The availability of such computational power has fostered the creation and re-deployment of algorithms, new and old, creating entirely new classes of applications. In this paper, a GPU implementation of the Center-Surround Distribution […]

CUDA

Dec, 18

A Single (Unified) Shader GPU Microarchitecture for Embedded Systems

We present and evaluate the TILA-rin GPU microarchitecture for embedded systems using the ATTILA GPU simulation framework. We use a trace from an execution of the Unreal Tournament 2004 PC game to eval uate and compare the performance of the proposed embedded GPU against a baseline GPU architecture for the PC. We evaluate the different […]

OpenGL

Dec, 18

A Cross-Input Adaptive Framework for GPU Programs Optimization

Recent years have seen a trend in using graphic processing units (GPU) as accelerators for general-purpose computing. The inexpensive, single-chip, massively parallel architecture of GPU has evidentially brought factors of speedup to many numerical applications. However, the development of a high-quality GPU application is challenging, due to the large optimization space and complex unpredictable effects […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Long time-scale simulations of in vivo diffusion using GPU hardware

Large-scale FFT on GPU clusters

Shader Performance Analysis on a Modern GPU Architecture

GPU clusters for high-performance computing

Accelerating Template-Based Matching on the GPU for AR Applications

Accelerating SQL Database Operations on a GPU with CUDA

Efficient, High-Quality Bayer Demosaic Filtering on GPUs

GPU-based Island Model for Evolutionary Algorithms

Accelerating K-Means on the Graphics Processor via CUDA

A GPU based implementation of Center-Surround Distribution Distance for feature extraction and matching

A Single (Unified) Shader GPU Microarchitecture for Embedded Systems

A Cross-Input Adaptive Framework for GPU Programs Optimization

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)