high performance computing on graphics processing units: hgpu.org

Posts

Jan, 2

Sinus Endoscopy – Application of Advanced GPU Volume Rendering for Virtual Endoscopy

For difficult cases in endoscopic sinus surgery, a careful planning of the intervention is necessary. Due to the reduced field of view during the intervention, the surgeons have less information about the surrounding structures in the working area compared to open surgery. Virtual endoscopy enables the visualization of the operating field and additional information, such […]

OpenGL

Jan, 2

Performing efficient NURBS modeling operations on the GPU

We present algorithms for evaluating and performing modeling operations on NURBS surfaces using the programmable fragment processor on the Graphics Processing Unit (GPU). We extend our GPU-based NURBS evaluator that evaluates NURBS surfaces to compute exact normals for either standard or rational B-spline surfaces for use in rendering and geometric modeling. We build on these […]

Dec, 29

GPU Color Constancy

A sensor located inside a digital camera is only able to measure the light that is reflected by an object. The reflected light varies with the spectral power distribution of the illuminant. Hence, images taken with a digital camera may show a strong color cast if an incorrect white balance setting has been chosen. Such […]

OpenGL

Dec, 29

Wrinkling Coarse Meshes on the GPU

The simulation of complex layers of folds of cloth can be handled through algorithms which take the physical dynamics into account. In many cases, however, it is sufficient to generate wrinkles on a piece of garment which mostly appears spread out. This paper presents a corresponding fully GPU-based, easy-to-control, and robust method to generate and […]

Dec, 29

Scalable GPU rendering of CSG models

Existing methods that are able to interactively render complex CSG objects with the aid of GPUs are both image based and severely bandwidth limited. In this paper we present a new approach to this problem whose main advantage is its capability to efficiently scale the dependency on CPU instruction throughput, memory bandwidth and GPU instruction […]

OpenGL

Dec, 29

Source-to-Source Optimization of CUDA C for GPU Accelerated Cardiac Cell Modeling

Large and complex systems of ordinary differential equations (ODEs) arise in diverse areas of science and engineering, and pose special challenges on a streaming processor owing to the large amount of state they manipulate. We describe a set of domain-specific source transformations on CUDA C that improved performance by x6.7 on a system of ODEs […]

CUDA

Dec, 29

Benchmarking GPU Devices with N-Body Simulations

Recent developments in processing devices such as graphical processing units and multi-core systems offer opportunities to make use of parallel techniques at the chip level to obtain high performance. We discuss the difficulties in establishing suitable benchmark codes for making comparisons across these device architectures and in a way that is representative of key applications. […]

CUDA

Dec, 29

Image-Space GPU Metaballs for Time-Dependent Particle Data Sets

Molecular dynamics simulations are today a widelyused tool in many research fields. Such simulations produce large time-dependent data sets, whichneed to be interactively visualised allowing efficient exploration. On the other hand, commonlyused point-based rendering of the individual particles usually fails to emphasise global contiguousstructures like particle clusters. To solve this issue,we want to visualise these […]

OpenGL

Dec, 29

High performance realtime vision for mobile robots on the GPU

We present a real time vision system designed for and implemented on a graphics processing unit (GPU). After an introduction in GPU programming we describe the architecture of the system and software running on the GPU. We show the advantages of implementing a vision processor on the GPU rather than on a CPU as well […]

OpenGL

Dec, 29

GPU Accelerated Image Registration in Two and Three Dimensions

Medical image registration tasks of large volume datasets, especially in the non-rigid case, often put a heavy burden on computing resources. GPUs are a promising new approach to address computational intensive image processing tasks. We investigate recently introduced GPU hardware features that accelerate 2D and 3D rigid and nonrigid registration tasks. Our implementation is entirely […]

OpenGL

Dec, 29

PNG1 triangles for tangent plane continuous surfaces on the GPU

Improving the visual appearance of coarse triangle meshes is usually done with graphics hardware with per-pixel shading techniques. Improving the appearance at silhouettes is inherently hard, as shading has only a small influence there and the geometry must be corrected. With the new geometry shader stage released with DirectX 10, the functionality to generate new […]

Dec, 29

Population Parallel GP on the G80 GPU

The availability of low cost powerful parallel graphics cards has stimulated a trend to port GP on Graphics Processing Units (GPUs). Previous works on GPUs have shown evaluation phase speedups for large training cases sets. Using the CUDA language on the G80 GPU, we show it is possible to efficiently interpret several GP programs in […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Sinus Endoscopy – Application of Advanced GPU Volume Rendering for Virtual Endoscopy

Performing efficient NURBS modeling operations on the GPU

GPU Color Constancy

Wrinkling Coarse Meshes on the GPU

Scalable GPU rendering of CSG models

Source-to-Source Optimization of CUDA C for GPU Accelerated Cardiac Cell Modeling

Benchmarking GPU Devices with N-Body Simulations

Image-Space GPU Metaballs for Time-Dependent Particle Data Sets

High performance realtime vision for mobile robots on the GPU

GPU Accelerated Image Registration in Two and Three Dimensions

PNG1 triangles for tangent plane continuous surfaces on the GPU

Population Parallel GP on the G80 GPU

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)