high performance computing on graphics processing units: hgpu.org

Posts

Oct, 12

GPU-Based Translation-Invariant 2D Discrete Wavelet Transform for Image Processing

The Discrete Wavelet Transform (DWT) is applied to various signal and image processing applications. However the computation is computational expense. Therefore plenty of approaches have been proposed to accelerate the computation. Graphics processing units (GPUs) can be used as stream processor to speed up the calculation of the DWT. In this paper, we present a […]

OpenGL

Oct, 12

High Performance Computing with Accelerators

High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. HPC has come to be applied to business uses of cluster-based supercomputers, such as data warehouses, line-of-business (LOB) applications, and transaction processing. In the past few years, a new class of HPC systems has emerged. These systems employ unconventional processor architectures-such as […]

Oct, 12

Ray Tracing on Graphics Hardware

Ray tracing is one of the important elements in photo-realistic image synthesis. Since ray tracing is computationally expensive, a large body of research has been devoted to improve the performance of ray tracing. One of the recent developments on efficient ray tracing is the implementation on graphics hardware. Similar to general purpose CPUs, recent graphics […]

CUDA

Oct, 12

Implementing modular arithmetic using OpenCL

Problem description: Most public key algorithms are based on modular arithmetic. The simplest, and original, implementation of the protocol uses the multiplicative group of integers modulo p, where p is prime and g is primitive root mod p. This is the way Diffie-Hellman is implemented. RSA is implemented in a similar way c=me mod p*q. […]

OpenCL

Oct, 12

General purpose computing on graphics processing units using OpenCL

General-Purpose computing using Graphics Processing Units (GPGPU) has been an area of active research for many years. During 2009 and 2010 much has happened in the GPGPU research field with the release of the Open Computing Language (OpenCL) programming framework and the new NVIDIA Fermi Graphics Processing Unit (GPU) architecture. This thesis explores the hardware […]

CUDA

•

OpenCL

Oct, 12

Distance Fields Accelerated with OpenCL

An important task in any graphical simulation is the collision detection between the objects in the simulation. It is desirable to have a good general method for collision detection with high performance. This thesis describes an implementation of a collision detection method that uses distance fields to detect collisions. This method is quite robust and […]

OpenCL

Oct, 12

Cinematic Particle Systems with OpenCL

High-particle-count simulations are becoming increasingly crucial in many different aspects of our world today: both in entertainment – within video games, movies, and the like – and in scientific fields, where particle systems are capable of simulating and visualizing many interesting phenomena. This paper will explore the possibility of parallelizing the simulation of these large […]

OpenCL

Oct, 12

Color Correction Acceleration Using a Color Cube and OpenCL

The article deals with the problem of real time color correction on modern but not dedicated video hardware, suggesting a new implementation of fast algorithm for color transformation utilizing 3D look-up tables. We focus on highly parallel nature of the proposed method and employ the GPU to perform the color calculations side-byside. The paper is […]

OpenCL

Oct, 12

Evaluating performance and portability of OpenCL programs

Recently, OpenCL, a new open programming standard for GPGPU programming, has become available in addition to CUDA. OpenCL can support various compute devices due to its higher abstraction programming framework. Since there is a semantic gap between OpenCL and compute devices, the OpenCL C compiler plays important roles to exploit the potential of compute devices […]

CUDA

•

OpenCL

Oct, 11

Real-Time Rigid Body Interactions

Rigid body simulations are useful in many areas, most notably video games and computer animation. However, the requirements for accuracy and performance vary greatly between applications. In this project we combine methods and techniques from different sources to implement a rigid body simulation. The simulation uses a particle representation to approximate objects with the intent […]

OpenCL

•

OpenGL

Oct, 11

Performance Characterization and Optimization of Atomic Operations on AMD GPUs

Atomic operations are important building blocks in supporting general-purpose computing on graphics processing units (GPUs). For instance, they can be used to coordinate execution between concurrent threads, and in turn, assist in constructing complex data structures such as hash tables or implementing GPU-wide barrier synchronization. While the performance of atomic operations has improved substantially on […]

OpenCL

Oct, 11

Performance and Power Analysis of ATI GPU: A Statistical Approach

We present a comprehensive study on the performance and power consumption of a recent ATI GPU. By employing a rigorous statistical model to analyze execution behaviors of representative general-purpose GPU (GPGPU) applications, we conduct insightful investigations on the target GPU architecture. Our results demonstrate that the GPU execution throughput and the power dissipation are dependent […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

GPU-Based Translation-Invariant 2D Discrete Wavelet Transform for Image Processing

High Performance Computing with Accelerators

Ray Tracing on Graphics Hardware

Implementing modular arithmetic using OpenCL

General purpose computing on graphics processing units using OpenCL

Distance Fields Accelerated with OpenCL

Cinematic Particle Systems with OpenCL

Color Correction Acceleration Using a Color Cube and OpenCL

Evaluating performance and portability of OpenCL programs

Real-Time Rigid Body Interactions

Performance Characterization and Optimization of Atomic Operations on AMD GPUs

Performance and Power Analysis of ATI GPU: A Statistical Approach

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)