high performance computing on graphics processing units: hgpu.org

Posts

Sep, 14

Seeing through the fog: an algorithm for fast and accurate touch detection in optical tabletop surfaces

Fast and accurate touch detection is critical to the usability of multi-touch tabletops. In optical tabletops, such as those using the popular FTIR and DI technologies, this requires efficient and effective noise reduction to enhance touches in the camera’s input. Common approaches to noise reduction do not scale to larger tables, leaving designers with a […]

OpenGL

Sep, 14

Subpixel reconstruction antialiasing for deferred shading

Subpixel Reconstruction Antialiasing (SRAA) combines singlepixel (1x) shading with subpixel visibility to create antialiased images without increasing the shading cost. SRAA targets deferred-shading renderers, which cannot use multisample antialiasing. SRAA operates as a post-process on a rendered image with superresolution depth and normal buffers, so it can be incorporated into an existing renderer without modifying […]

CUDA

Sep, 14

The Application Perspective: Seeking Productivity and Performance

In this note we propose two projects: (1) creating a hierarchical programming model from current models; and (2) extracting application primitives from the "13 dwarfs". The first topic addresses the need for a unified and manageable framework for very large-scale concurrent execution. This is the productivity part: less complexity will drive better mapping of algorithms […]

Sep, 13

AnySL: efficient and portable shading for ray tracing

While a number of different shading languages have been developed, their efficient integration into an existing renderer is notoriously difficult, often boiling down to implementing an entire compiler toolchain for each language. Furthermore, no shading language is broadly supported across the variety of rendering systems. AnySL attacks this issue from multiple directions: We compile shaders […]

Sep, 13

A directionally adaptive edge anti-aliasing filter

The latest generation of graphics hardware provides direct access to multisample anti-aliasing (MSAA) rendering data. By taking advantage of these existing pixel subsample values, an intelligent reconstruction filter can be computed using programmable GPU shader units. This paper describes an adaptive anti-aliasing (AA) filter for real-time rendering on the GPU. Improved quality is achieved by […]

Sep, 13

Real-Time Volumetric Shadows using 1D Min-Max Mipmaps

Light scattering in a participating medium is responsible for several important effects we see in the natural world. In the presence of occluders, computing single scattering requires integrating the illumination scattered towards the eye along the camera ray, modulated by the visibility towards the light at each point. Unfortunately, incorporating volumetric shadows into this integral, […]

Sep, 13

Exploring the use of glossy light volumes for interactive global illumination

From the literature, it is known that backward polygon beam tracing and other light volume methods are well suited to gather path coherency from specular scattering surfaces. This is of course useful for modelling and efficiently simulating caustics (LS+DE paths). This paper generalises backward polygon beam tracing to also model glossy scattering surfaces. To this […]

OpenGL

Sep, 13

Computational stereo camera system with programmable control loop

Stereoscopic 3D has gained significant importance in the entertainment industry. However, production of high quality stereoscopic content is still a challenging art that requires mastering the complex interplay of human perception, 3D display properties, and artistic intent. In this paper, we present a computational stereo camera system that closes the control loop from capture and […]

CUDA

Sep, 13

Data-intensive document clustering on graphics processing unit (GPU) clusters

Document clustering is a central method to mine massive amounts of data. Due to the explosion of raw documents generated on the Internet and the necessity to analyze them efficiently in various intelligent information systems, clustering techniques have reached their limitations on single processors. Instead of single processors, general-purpose multi-core chips are increasingly deployed in […]

CUDA

Sep, 13

Lossless data compression on GPGPU architectures

Modern graphics processors provide exceptional computa- tional power, but only for certain computational models. While they have revolutionized computation in many fields, compression has been largely unnaffected. This paper aims to explain the current issues and possibili- ties in GPGPU compression. This is done by a high level overview of the GPGPU computational model in […]

Sep, 13

Parallel volume rendering implementation on graphics cards using CUDA

The ever-increasing amounts of volume data require high-end parallel visualization methods to process this data interactively. To meet the demands, progamming on graphics cards offers an effective and fast approach to compute volume rendering methods due to the parallel architecture of today’s graphics cards. In this paper, we introduce a volume ray casting method working […]

CUDA

•

OpenGL

Sep, 13

Gate-Level Simulation with GPU Computing

Functional verification of modern digital designs is a crucial, time-consuming task impacting not only the correctness of the final product, but also its time to market. At the heart of most of today’s verification efforts is logic simulation, used heavily to verify the functional correctness of a design for a broad range of abstraction levels. […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Seeing through the fog: an algorithm for fast and accurate touch detection in optical tabletop surfaces

Subpixel reconstruction antialiasing for deferred shading

The Application Perspective: Seeking Productivity and Performance

AnySL: efficient and portable shading for ray tracing

A directionally adaptive edge anti-aliasing filter

Real-Time Volumetric Shadows using 1D Min-Max Mipmaps

Exploring the use of glossy light volumes for interactive global illumination

Computational stereo camera system with programmable control loop

Data-intensive document clustering on graphics processing unit (GPU) clusters

Lossless data compression on GPGPU architectures

Parallel volume rendering implementation on graphics cards using CUDA

Gate-Level Simulation with GPU Computing

Recent source codes

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

Most viewed papers (last 30 days)