high performance computing on graphics processing units: hgpu.org

Posts

Nov, 5

Solving Path Problems on the GPU

We consider the computation of shortest paths on Graphic Processing Units (GPUs). The blocked recursive elimination strategy we use is applicable to a class of algorithms (such as all-pairs shortest-paths, transitive closure, and LU decomposition without pivoting) having similar data access patterns. Using the all-pairs shortest-paths problem as an example, we uncover potential gains over […]

CUDA

Nov, 5

Parallel search on video cards

Recent approaches exploiting the massively parallel architecture of graphics processors (GPUs) to accelerate database operations have achieved intriguing results. While parallel sorting received significant attention, parallel search has not been explored. With p-ary search we present a novel parallel search algorithm for large-scale database index operations that scales with the number of processors and outperforms […]

CUDA

Nov, 5

A Performance Comparison of CUDA and OpenCL

CUDA and OpenCL offer two different interfaces for programming GPUs. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs. Although OpenCL promises a portable language for GPU programming, its generality may entail a performance penalty. In this paper, […]

CUDA

•

OpenCL

Nov, 5

Faster matrix-vector multiplication on GeForce 8800GTX

Recently a GPU has acquired programmability to perform general purpose computation fast by running ten thousands of threads concurrently. This paper presents a new algorithm for dense matrix-vector multiplication on NVIDIA CUDA architecture. The experimental results on GeForce 8800GTX show that the proposed algorithm runs maximum 15.69 (resp., 32.88) times faster than the sgemv routine […]

CUDA

Nov, 5

Acceleration of direct volume rendering with programmable graphics hardware

We propose a method to accelerate direct volume rendering using programmable graphics hardware (GPU). In the method, texture slices are grouped together to form a texture slab. Rendering non-empty slabs from front to back viewing order generates the resultant image. Considering each pixel of the image as a ray, slab silhouette maps (SSMs) are used […]

OpenGL

Nov, 5

Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors

This paper presents a graphics processor based implementation of the Finite Difference Time Domain (FDTD), which uses a central finite differencing scheme for solving Maxwell’s equations for electromagnetics. FDTD simulations can be very computationally expensive and require thousands of CPU hours to solve on traditional general purpose processors. Modern Graphics Processing Units (GPUs) found in […]

OpenGL

Nov, 5

Performance Comparison of Graphics Processors to Reconfigurable Logic: A Case Study

A systematic approach to the comparison of the graphics processor (GPU) and reconfigurable logic is defined in terms of three throughput drivers. The approach is applied to five case study algorithms, characterized by their arithmetic complexity, memory access requirements, and data dependence, and two target devices: the nVidia GeForce 7900 GTX GPU and a Xilinx […]

OpenGL

Nov, 5

Exploiting frame-to-frame coherence for accelerating high-quality volume raycasting on graphics hardware

GPU-based raycasting offers an interesting alternative to conventional slice-based volume rendering due to the inherent flexibility and the high quality of the generated images. Recent advances in graphics hardware allow for the ray traversal and volume sampling to be executed on a per-fragment level completely on the GPU leading to interactive framerates. In this work […]

OpenGL

Nov, 5

GPUTeraSort: high performance graphics co-processor sorting for large database management

We present a novel external sorting algorithm using graphics processors (GPUs) on large databases composed of billions of records and wide keys. Our algorithm uses the data parallelism within a GPU along with task parallelism by scheduling some of the memory-intensive and compute-intensive threads on the GPU. Our new sorting architecture provides multiple memory interfaces […]

Nov, 4

2D/3D image registration on the GPU

We present a method that performs a rigid 2D/3D image registration efficiently on the Graphical Processing Unit (GPU). As one main contribution of this paper, we propose an efficient method for generating realistic DRRs that are visually similar to x-ray images. Therefore, we model some of the electronic post-processes of current x-ray C-arm-systems. As another […]

OpenGL

Nov, 4

Treecode and fast multipole method for N-body simulation with CUDA

Due to the variety and importance of applications of treecodes and FMM, the combination of algorithmic acceleration with hardware acceleration can have tremendous impact. Alas, programming these algorithms efficiently is no piece of cake. In this contribution, we aim to present GPU kernels for treecode and FMM in, as much as possible, an uncomplicated, accessible […]

CUDA

Nov, 4

Simple dynamic LOD for geometry images

We present a new approach for dynamic LOD processing for geometry images (GIs) in the graphics processing unit (GPU). A GI mipmap is constructed from a scanned 3D model, then a mipmap selector map is created from the camera position’s information and the GI-mipmap. Using the mipmap selector map and the current camera position, some […]

OpenGL