high performance computing on graphics processing units: hgpu.org

Posts

Dec, 28

NVIDIA CUDA software and gpu parallel computing architecture

In the past, graphics processors were special purpose hardwired application accelerators, suitable only for conventional rasterization-style graphics applications. Modern GPUs are now fully programmable, massively parallel floating point processors. This talk will describe NVIDIA’s massively multithreaded computing architecture and CUDA software for GPU computing. The architecture is a scalable, highly parallel architecture that delivers high […]

CUDA

Dec, 28

Interactively Rendering Dynamic Caustics on GPU

In this paper, a new technique is presented for interactive rendering of caustics fully processed on GPU. Without any pre-computation required, the algorithm can directly render refractive caustics from complex deformable transparent objects onto an opaque receiver surface. By the technique we accurately trace the path of the photons and calculate the energy carried by […]

OpenGL

Dec, 28

A low-power handheld GPU using logarithmic arithmetic and triple DVFS power domains

In this paper, a low-power GPU architecture is described for the handheld systems with limited power and area budgets. The GPU is designed using logarithmic arithmetic for power- and area-efficient design. For this GPU, a multifunction unit is proposed based on the hybrid number system of floating-point and logarithmic numbers and the matrix, vector, and […]

OpenGL

Dec, 28

GPU Accelerated Gesture Detection for Real Time Interaction

Over the past years, the interaction between humans and computers (HCI) evolved to one of the most important research topics in computer science. Therefore, ﬁnding a way for an intuitive, easy and affordable interaction is the main challenge. Optical markerless tracking using consumer hardware can satisfy these problems. However, in order to be able to […]

Dec, 28

A GPU Sub-pixel Algorithm for Autostereoscopic Virtual Reality

Autostereoscopic displays enable unencumbered immersive virtual reality, but at a significant computational expense. This expense impacts the feasibility of autostereo displays in high-performance real-time interactive applications. A new autostereo rendering algorithm named autostereo combiner addresses this problem using the programmable vertex and fragment pipelines of modern graphics processing units (GPUs). This algorithm is applied to […]

OpenGL

Dec, 28

Fast Hydraulic Erosion Simulation and Visualization on GPU

Natural mountains and valleys are gradually eroded by rainfall and river flows. Physically-based modeling of this complex phenomenon is a major concern in producing realistic synthesized terrains. However, despite some recent improvements, existing algorithms are still computationally expensive, leading to a time-consuming process fairly impractical for terrain designers and 3D artists. In this paper, we […]

OpenGL

Dec, 28

Simulation of deformable environment with haptic feedback on GPU

Interactive simulations of deformable bodies are a growing research area with possible applications in several fields, i.e. computer aided surgery. The main implementation issue is to mimic the real behavior of the body at the extremely high rates required by haptic devices. Since even high-end computers have inadequate performance, one possible solution is to exploit […]

CUDA

Dec, 28

GPU acceleration of numerical weather prediction

Weather and climate prediction software has enjoyed the benefits of exponentially increasing processor power for almost 50 years. Even with the advent of large-scale parallelism in weather models, much of the performance increase has come from increasing processor speed rather than increased parallelism. This free ride is nearly over. Recent results also indicate that simply […]

CUDA

Dec, 27

Cellular automaton for ultra-fast watershed transform on GPU

In this paper we describe a cellular automaton (CA) used to perform the watershed transform in N-D images. Our method is based on image integration via the Ford-Bellman shortest paths algorithm. Due to the local nature of CA algorithms we show that they are designed to run on massively parallel processors and therefore, be efficiently […]

Dec, 27

Using Reconfigurable Logic to Optimise GPU Memory Accesses

Memory access patterns common in video processing algorithms, which are unsuited to the GPU (Graphics Processing Unit) memory system, are identified. We develop REDA (Reconfigurable Engine for Data Access) to improve GPU performance for such access patterns, by employing reconfigurable logic for address mapping. It is shown that a sixty times reduction in number of […]

Dec, 27

BVH for efficient raytracing of dynamic metaballs on GPU

Metaballs [Bloomenthal 1997] are effective to represent fluids and similar complex and deformable geometries, but their implicit nature makes difficult their visualization in real time. A common strategy is to tessellate the resulting isosurface and to render it on GPU, but it scales poorly as the number of metaballs increases. Kanamori et al. [2008] efficiently […]

CUDA

Dec, 27

GPU Supported Patch-Based Tessellation for Dual Subdivision

A novel patch-based tessellation method for a dual subdivision scheme, the Doo-Sabin subdivision, is presented. Patch-based refinement for face-split subdivision schemes such as Catmull-Clark subdivision or Loop subdivision has been widely studied. But there is no patch-based tessellation algorithm for dual subdivision scheme yet. The method presented in this paper is the first attempt to […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

NVIDIA CUDA software and gpu parallel computing architecture

Interactively Rendering Dynamic Caustics on GPU

A low-power handheld GPU using logarithmic arithmetic and triple DVFS power domains

GPU Accelerated Gesture Detection for Real Time Interaction

A GPU Sub-pixel Algorithm for Autostereoscopic Virtual Reality

Fast Hydraulic Erosion Simulation and Visualization on GPU

Simulation of deformable environment with haptic feedback on GPU

GPU acceleration of numerical weather prediction

Cellular automaton for ultra-fast watershed transform on GPU

Using Reconfigurable Logic to Optimise GPU Memory Accesses

BVH for efficient raytracing of dynamic metaballs on GPU

GPU Supported Patch-Based Tessellation for Dual Subdivision

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)