high performance computing on graphics processing units: hgpu.org

Posts

Jan, 2

Accelerating Large-Scale Convolutional Neural Networks with Parallel Graphics Multiprocessors

Training convolutional neural networks (CNNs) on large sets of high-resolution images is too computationally intense to be performed on commodity CPUs. Such architectures, however, achieve state-of-the-art results on low-resolution machine vision tasks such as recognition of handwritten characters. We have adapted the inherent multi-level parallelism of CNNs for Nvidia’s CUDA GPU architecture to accelerate the […]

CUDA

Jan, 2

Parallel multigrid preconditioning on graphics processing units (GPUs) for robust power grid analysis

Leveraging the power of nowadays graphics processing units for robust power grid simulation remains a challenging task. Existing preconditioned iterative methods that require incomplete matrix factorizations can not be effectively accelerated on GPU due to its limited hardware resource as well as data parallel computing. This work presents an efficient GPU-based multigrid preconditioning algorithm for […]

CUDA

Jan, 2

Automatic parallelization for graphics processing units

Accelerated graphics cards, or Graphics Processing Units (GPUs), have become ubiquitous in recent years. On the right kinds of problems, GPUs greatly surpass CPUs in terms of raw performance. However, because they are difficult to program, GPUs are used only for a narrow class of special-purpose applications; the raw processing power made available by GPUs […]

Jan, 2

Sinus Endoscopy – Application of Advanced GPU Volume Rendering for Virtual Endoscopy

For difficult cases in endoscopic sinus surgery, a careful planning of the intervention is necessary. Due to the reduced field of view during the intervention, the surgeons have less information about the surrounding structures in the working area compared to open surgery. Virtual endoscopy enables the visualization of the operating field and additional information, such […]

OpenGL

Jan, 2

Performing efficient NURBS modeling operations on the GPU

We present algorithms for evaluating and performing modeling operations on NURBS surfaces using the programmable fragment processor on the Graphics Processing Unit (GPU). We extend our GPU-based NURBS evaluator that evaluates NURBS surfaces to compute exact normals for either standard or rational B-spline surfaces for use in rendering and geometric modeling. We build on these […]

Dec, 29

GPU Color Constancy

A sensor located inside a digital camera is only able to measure the light that is reflected by an object. The reflected light varies with the spectral power distribution of the illuminant. Hence, images taken with a digital camera may show a strong color cast if an incorrect white balance setting has been chosen. Such […]

OpenGL

Dec, 29

Wrinkling Coarse Meshes on the GPU

The simulation of complex layers of folds of cloth can be handled through algorithms which take the physical dynamics into account. In many cases, however, it is sufficient to generate wrinkles on a piece of garment which mostly appears spread out. This paper presents a corresponding fully GPU-based, easy-to-control, and robust method to generate and […]

Dec, 29

Scalable GPU rendering of CSG models

Existing methods that are able to interactively render complex CSG objects with the aid of GPUs are both image based and severely bandwidth limited. In this paper we present a new approach to this problem whose main advantage is its capability to efficiently scale the dependency on CPU instruction throughput, memory bandwidth and GPU instruction […]

OpenGL

Dec, 29

Source-to-Source Optimization of CUDA C for GPU Accelerated Cardiac Cell Modeling

Large and complex systems of ordinary differential equations (ODEs) arise in diverse areas of science and engineering, and pose special challenges on a streaming processor owing to the large amount of state they manipulate. We describe a set of domain-specific source transformations on CUDA C that improved performance by x6.7 on a system of ODEs […]

CUDA

Dec, 29

Benchmarking GPU Devices with N-Body Simulations

Recent developments in processing devices such as graphical processing units and multi-core systems offer opportunities to make use of parallel techniques at the chip level to obtain high performance. We discuss the difficulties in establishing suitable benchmark codes for making comparisons across these device architectures and in a way that is representative of key applications. […]

CUDA

Dec, 29

Image-Space GPU Metaballs for Time-Dependent Particle Data Sets

Molecular dynamics simulations are today a widelyused tool in many research fields. Such simulations produce large time-dependent data sets, whichneed to be interactively visualised allowing efficient exploration. On the other hand, commonlyused point-based rendering of the individual particles usually fails to emphasise global contiguousstructures like particle clusters. To solve this issue,we want to visualise these […]

OpenGL

Dec, 29

High performance realtime vision for mobile robots on the GPU

We present a real time vision system designed for and implemented on a graphics processing unit (GPU). After an introduction in GPU programming we describe the architecture of the system and software running on the GPU. We show the advantages of implementing a vision processor on the GPU rather than on a CPU as well […]

OpenGL

high performance computing on graphics processing units: hgpu.org

Posts

Accelerating Large-Scale Convolutional Neural Networks with Parallel Graphics Multiprocessors

Parallel multigrid preconditioning on graphics processing units (GPUs) for robust power grid analysis

Automatic parallelization for graphics processing units

Sinus Endoscopy – Application of Advanced GPU Volume Rendering for Virtual Endoscopy

Performing efficient NURBS modeling operations on the GPU

GPU Color Constancy

Wrinkling Coarse Meshes on the GPU

Scalable GPU rendering of CSG models

Source-to-Source Optimization of CUDA C for GPU Accelerated Cardiac Cell Modeling

Benchmarking GPU Devices with N-Body Simulations

Image-Space GPU Metaballs for Time-Dependent Particle Data Sets

High performance realtime vision for mobile robots on the GPU

Recent source codes

DITRON: Distributed Compiler based on Triton for Parallel Systems

IntelliKit: Agent-first tooling for AMD hardware

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

Agentic Code Optimization via Compiler-LLM Cooperation

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Most viewed papers (last 30 days)