high performance computing on graphics processing units: hgpu.org

Posts

Dec, 6

Ray-Casted BlockMaps for Large Urban Models Visualization

We introduce a GPU-friendly technique that efficiently exploits the highly structured nature of urban environments to ensure rendering quality and interactive performance of city exploration tasks. Central to our approach is a novel discrete representation, called BlockMap, for the efficient encoding and rendering of a small set of textured buildings far from the viewer. A […]

OpenGL

Dec, 6

Animating physically based explosions in real-time

We present a framework for real-time animation of explosions that runs completely on the GPU. The simulation allows for arbitrary internal boundaries and is governed by a combustion process, a Stable Fluid solver, which includes thermal expansion, and turbulence modeling. The simulation results are visualised by two particle systems rendered using animated textures. The results […]

Dec, 6

Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems

By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to exotic technologies such as Field Programmable Gate […]

Dec, 6

Virtual open heart surgery: obtaining models suitable for surgical simulation.

We present a pre-processing strategy including imaging, segmentation, and model reconstruction that is well suited for previously published GPU-accelerated techniques for surgical simulation. In particular we describe these modeling steps as a prerequisite for our virtual open heart surgery simulator. A short description including relevant references is presented for each of the steps.

Dec, 6

Implementation and performance evaluation of reconstruction algorithms on graphics processors

The high-throughput needs in electron tomography and in single particle analysis have driven the parallel implementation of several reconstruction algorithms and software packages on computing clusters. Here, we report on the implementation of popular reconstruction algorithms as weighted backprojection, simultaneous iterative reconstruction technique (SIRT) and simultaneous algebraic reconstruction technique (SART) on common graphics processors (GPUs). […]

Dec, 6

AES Encryption Implementation and Analysis on Commodity Graphics Processing Units

Graphics Processing Units (GPUs) present large potential performance gains within stream processing applications over the standard CPU. These performance gains are best realised when high computational intensity is required across large amounts of mostly independent input elements. The GPU’s success in general purpose stream processing has been demonstrated in many diverse fields, though attempts to […]

OpenGL

Dec, 6

Multi-fragment effects on the GPU using the k-buffer

Many interactive rendering algorithms require operations on multiple fragments (i.e., ray intersections) at the same pixel location: however, current Graphics Processing Units (GPUs) capture only a single fragment per pixel. Example effects include transparency, translucency, constructive solid geometry, depth-of-field, direct volume rendering, and isosurface visualization. With current GPUs, programmers implement these effects using multiple passes […]

OpenGL

Dec, 6

Real-time hair simulation on GPU with a dynamic wisp model

In this paper, we present a method for real-time hair animation. We combine a conventional particle-based dynamic simulation and a dynamic hair generation technique. First, the movements of a small number of hairs (coarse model) are simulated using a dynamic simulation. Since this stage uses only a small number of hairs, the simulation is quick. […]

Dec, 6

Dynamic deformation textures: GPU-accelerated simulation of deformable models in contact

We present an efficient algorithm for simulating contacts between deformable bodies with high-resolution surface geometry using dynamic deformation textures, which reformulate the 3D elastoplastic deformation and collision handling on a 2D parametric atlas to reduce the extremely high number of degrees of freedom arising from large contact regions and high-resolution geometry. Such computationally challenging dynamic […]

Dec, 6

Towards multi-GPU support for visualization

At the Institute for Ultrascale Visualization, we are tackling the broad problem of building visualization solutions for the petascale age and beyond. As computing transitions into a new age where scalar solutions no longer improve in performance, and parallel solutions are the vehicle for future performance gains, one key challenge in our effort is to […]

CUDA

Dec, 6

A Duality Based Approach for Realtime TV-L1 Optical Flow

Variational methods are among the most successful approaches to calculate the optical flow between two image frames. A particularly appealing formulation is based on total variation (TV) regularization and the robust L1 norm in the data fidelity term. This formulation can preserve discontinuities in the flow field and offers an increased robustness against illumination changes, […]

OpenGL

Dec, 6

Real Time Capture of Audio Images and their Use with Video

Spherical microphone arrays provide an ability to compute the acoustical intensity corresponding to different spatial directions in a given frame of audio-data. These intensities may be exhibited as an image and these images updated at a high frame rate to achieve a video stream if the data capture and intensity computations can be performed sufficiently […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Ray-Casted BlockMaps for Large Urban Models Visualization

Animating physically based explosions in real-time

Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems

Virtual open heart surgery: obtaining models suitable for surgical simulation.

Implementation and performance evaluation of reconstruction algorithms on graphics processors

AES Encryption Implementation and Analysis on Commodity Graphics Processing Units

Multi-fragment effects on the GPU using the k-buffer

Real-time hair simulation on GPU with a dynamic wisp model

Dynamic deformation textures: GPU-accelerated simulation of deformable models in contact

Towards multi-GPU support for visualization

A Duality Based Approach for Realtime TV-L1 Optical Flow

Real Time Capture of Audio Images and their Use with Video

Recent source codes

DITRON: Distributed Compiler based on Triton for Parallel Systems

IntelliKit: Agent-first tooling for AMD hardware

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

Agentic Code Optimization via Compiler-LLM Cooperation

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Most viewed papers (last 30 days)