high performance computing on graphics processing units: hgpu.org

Posts

Jul, 27

Increasing the Accuracy of the Space-Sweeping Approach to Stereo Reconstruction, using Spherical Backprojection Surfaces

In this paper, interest is focused on the accurate and time-efficient stereo reconstruction, for the purpose of generating 3D animated scenes from multiple synchronized videos. The plane-sweeping approach is reviewed as relevant to the goal of time-efficiency, since its execution can be optimized on a GPU. A method compatible for optimization on the GPU is […]

Jul, 27

Real-Time Discriminative Background Subtraction

The authors examine the problem of segmenting foreground objects in live video when background scene textures change over time. In particular, we formulate background subtraction as minimizing a penalized instantaneous risk functional-yielding a local online discriminative algorithm that can quickly adapt to temporal changes. We analyze the algorithm’s convergence, discuss its robustness to nonstationarity, and […]

Jul, 27

Dynamic Shader Generation for Flexible Multi-Volume Visualization

Volume rendering of multiple intersecting volumetric objects is a difficult visualization task, especially if different rendering styles need to be applied to the components, in order to achieve the desired illustration effect. Real-time performance for even complex scenarios is obtained by exploiting the speed and flexibility of modern GPUs, but at the same time programming […]

OpenGL

Jul, 27

A very fast census-based stereo matching implementation on a graphics processing unit

In this paper a very fast graphics processing unit implementation of a local, census-correlation-based stereo matching algorithm is presented. In comparison to absolute or squared difference correlation techniques, the census transform is computational more expensive which led to the motivation of a GPU-based implementation. Due to the parallel architecture of modern graphics cards, complex algorithms […]

Jul, 26

Efficient Rasterization for Outdoor Radio Wave Propagation

Conventional beam tracing can be used for solving global illumination problems. It is an efficient algorithm and performs very well when implemented on the GPU. This allows us to apply the algorithm in a novel way to the problem of radio wave propagation. The simulation of radio waves is conceptually analogous to the problem of […]

CUDA

Jul, 26

Scene independent real-time indirect illumination

A novel method for real-time simulation of indirect illumination is presented in this paper. The method, which we call direct radiance mapping (DRM), is based on basal radiance calculations and does not impose any restrictions on scene geometry or dynamics. This makes the method tractable for real-time rendering of arbitrary dynamic environments and for interactive […]

Jul, 26

Data-Aware Task Scheduling on Multi-accelerator Based Platforms

To fully tap into the potential of heterogeneous machines composed of multicore processors and multiple accelerators, simple offloading approaches in which the main trunk of the application runs on regular cores while only specific parts are offloaded on accelerators are not sufficient. The real challenge is to build systems where the application would permanently spread […]

CUDA

Jul, 26

Interactive transparency rendering for large CAD models

Transparency is an important graphics effect that can be used to significantly increase the realism of the rendered scene or to enable more effective visual inspection in engineering visualization. In this paper, we propose achieving interactive transparency rendering of a static scene by sorting the triangles in back-to-front order on CPU and supplying the sorted […]

Jul, 26

Discontinuous Galerkin Time Domain for Maxwell’s equations on GPUs

In this paper, we discuss our approach on the GPU implementation of the Discontinuous Galerkin Time-Domain (DGTD) method to solve the time dependent Maxwell’s equations. We exploit the inherent DGTD parallelism and combine the GPU computing capabilities with the benefits of a local time-stepping strategy. The combination results in significant increase in efficiency and reduction […]

Jul, 26

High-quality surface splatting on today’s GPUs

Point-based geometries evolved into a valuable alternative to surface representations based on polygonal meshes, because of their conceptual simplicity and superior flexibility. Elliptical surface splats were shown to allow for high-quality anti-aliased rendering by sophisticated EWA filtering. Since the publication of the original software-based EWA splatting, several authors tried to map this technique to the […]

OpenGL

Jul, 25

Exploring Novel Parallelization Technologies for 3-D Imaging Applications

Multi-dimensional imaging techniques involve the processing of high resolution images commonly used in medical, civil and remote-sensing applications. A barrier commonly encountered in this class of applications is the time required to carry out repetitive operations on large matrices. Partitioning these large datasets can help improve performance, and lends the data to more efficient parallel […]

Jul, 25

An energy model for graphics processing units

We present an energy model for a graphics processing unit (GPU) that is based on the amount and type of work performed in various parts of the unit. By designing and running directed tests on a GPU, we measure the energy consumed when performing different arithmetic and memory operations, allowing us to accurately predict the […]

high performance computing on graphics processing units: hgpu.org

Posts

Increasing the Accuracy of the Space-Sweeping Approach to Stereo Reconstruction, using Spherical Backprojection Surfaces

Real-Time Discriminative Background Subtraction

Dynamic Shader Generation for Flexible Multi-Volume Visualization

A very fast census-based stereo matching implementation on a graphics processing unit

Efficient Rasterization for Outdoor Radio Wave Propagation

Scene independent real-time indirect illumination

Data-Aware Task Scheduling on Multi-accelerator Based Platforms

Interactive transparency rendering for large CAD models

Discontinuous Galerkin Time Domain for Maxwell’s equations on GPUs

High-quality surface splatting on today’s GPUs

Exploring Novel Parallelization Technologies for 3-D Imaging Applications

An energy model for graphics processing units

Recent source codes

DITRON: Distributed Compiler based on Triton for Parallel Systems

IntelliKit: Agent-first tooling for AMD hardware

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

Agentic Code Optimization via Compiler-LLM Cooperation

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Most viewed papers (last 30 days)