high performance computing on graphics processing units: hgpu.org

Posts

Jan, 13

Fitting Galaxies on GPUs

Structural parameters are normally extracted from observed galaxies by fitting analytic light profiles to the observations. Obtaining accurate fits to high-resolution images is a computationally expensive task, requiring many model evaluations and convolutions with the imaging point spread function. While these algorithms contain high degrees of parallelism, current implementations do not exploit this property. With […]

CUDA

Jan, 13

Hardware-Assisted Projected Tetrahedra

We present a flexible and highly efficient hardware-assisted volume renderer grounded on the original Projected Tetrahedra (PT) algorithm. Unlike recent similar approaches, our method is exclusively based on the rasterization of simple geometric primitives and takes full advantage of graphics hardware. Both vertex and geometry shaders are used to compute the tetrahedral projection, while the […]

CUDA

Jan, 13

Accelerating SSL with GPUs

SSL/TLS is a standard protocol for secure Internet communication. Despite its great success, today’s SSL deployment is largely limited to security-critical domains. The low adoption rate of SSL is mainly due to high computation overhead on the server side. In this paper, we propose Graphics Processing Units (GPUs) as a new source of computing power […]

CUDA

Jan, 13

GPU-PIV

Digital Particle Image Velocimetry (PIV) is an optical technique used to measure the velocity of seeded particles in real flow. A CCD camera captures the flow field twice under exposure to a short duration laser flash. Recorded image pairs are cross-correlated to extract velocity information from these records. Time resolved PIV technology can capture images […]

OpenGL

Jan, 13

Rapid evaluation and evolution of neural models using graphics card hardware

This paper compares three common evolutionary algorithms and our modified GA, a Distributed Adaptive Genetic Algorithm (DAGA). The optimal approach is sought to adapt, in near real-time, biological model behaviour to that of real biology within a laboratory. Near real-time adaptation is achieved with a Graphics Processing Unit (GPU). This, together with evolutionary computation, enables […]

CUDA

Jan, 13

A Scalable and Reconfigurable Shared-Memory Graphics Cluster Architecture

If the computational demands of an interactive graphics rendering application cannot be met by a single commodity Graphics Processing Unit (GPU), multiple graphics accelerators may be utilised on multi-GPU based systems such as SLI [1] or Crossfire [2] or by a cluster of PCs in conjunction with a software infrastructure. Typically these PC cluster solutions […]

OpenGL

Jan, 13

Speeding up Mutual Information Computation Using NVIDIA CUDA Hardware

We present an efficient method for mutual information (MI) computation between images (2D or 3D) for NVIDIA’s “compute unified device architecture” (CUDA) compatible devices. Efficient parallelization of MI is particularly challenging on a “graphics processor unit” (GPU) due to the need for histogram-based calculation of joint and marginal probability mass functions (pmfs) with large number […]

CUDA

Jan, 13

Adaptive sampling in three dimensions for volume rendering on GPUs

Direct volume rendering of large volumetric data sets on programmable graphics hardware is often limited by the amount of available graphics memory and the bandwidth from main memory to graphics memory. Therefore, several approaches to volume rendering from compact representations of volumetric data have been published that avoid most of the data transfer between main […]

OpenGL

Jan, 13

Low-cost, high-speed computer vision using NVIDIA’s CUDA architecture

In this paper, we introduce real time image processing techniques using modern programmable Graphic Processing Units (GPU). GPUs are SIMD (Single Instruction, Multiple Data) device that is inherently data-parallel. By utilizing NVIDIA’s new GPU programming framework, “Compute Unified Device Architecture” (CUDA) as a computational resource, we realize significant acceleration in image processing algorithm computations. We […]

CUDA

Jan, 13

Efficient fault simulation on many-core processors

Fault simulation is essential in test generation, design for test and reliability assessment of integrated circuits. Reliability analysis and the simulation of self-test structures are particularly computationally expensive as a large number of patterns has to be evaluated. In this work, we propose to map a fault simulation algorithm based on the parallel-pattern single-fault propagation […]

CUDA

Jan, 13

GPU-Based 3D Texture Advection for the Visualization of Unsteady Flow Fields

We present an interactive visualization approach for the dense representation of unsteady 3D flow fields. The first part of this approach is a GPU-based 3D texture advection scheme that allows a slice of the 3D visual representation to be updated in a single rendering pass. In the second step, the result of the advection process […]

OpenGL

Jan, 12

K-Means on Commodity GPUs with CUDA

K-means algorithm is one of the most famous unsupervised clustering algorithms. Many theoretical improvements for the performance of original algorithms have been put forward, while almost all of them are based on single instruction single data (SISD) architecture processors (GPUs), which partly ignored the inherent paralleled characteristic of the algorithms. In this paper, a novel […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Fitting Galaxies on GPUs

Hardware-Assisted Projected Tetrahedra

Accelerating SSL with GPUs

GPU-PIV

Rapid evaluation and evolution of neural models using graphics card hardware

A Scalable and Reconfigurable Shared-Memory Graphics Cluster Architecture

Speeding up Mutual Information Computation Using NVIDIA CUDA Hardware

Adaptive sampling in three dimensions for volume rendering on GPUs

Low-cost, high-speed computer vision using NVIDIA’s CUDA architecture

Efficient fault simulation on many-core processors

GPU-Based 3D Texture Advection for the Visualization of Unsteady Flow Fields

K-Means on Commodity GPUs with CUDA

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)