1366

Posts

Nov, 5

Solving Path Problems on the GPU

We consider the computation of shortest paths on Graphic Processing Units (GPUs). The blocked recursive elimination strategy we use is applicable to a class of algorithms (such as all-pairs shortest-paths, transitive closure, and LU decomposition without pivoting) having similar data access patterns. Using the all-pairs shortest-paths problem as an example, we uncover potential gains over […]
Nov, 5

Parallel search on video cards

Recent approaches exploiting the massively parallel architecture of graphics processors (GPUs) to accelerate database operations have achieved intriguing results. While parallel sorting received significant attention, parallel search has not been explored. With p-ary search we present a novel parallel search algorithm for large-scale database index operations that scales with the number of processors and outperforms […]
Nov, 5

A Performance Comparison of CUDA and OpenCL

CUDA and OpenCL offer two different interfaces for programming GPUs. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs. Although OpenCL promises a portable language for GPU programming, its generality may entail a performance penalty. In this paper, […]
Nov, 5

Acceleration of direct volume rendering with programmable graphics hardware

We propose a method to accelerate direct volume rendering using programmable graphics hardware (GPU). In the method, texture slices are grouped together to form a texture slab. Rendering non-empty slabs from front to back viewing order generates the resultant image. Considering each pixel of the image as a ray, slab silhouette maps (SSMs) are used […]
Nov, 5

Faster matrix-vector multiplication on GeForce 8800GTX

Recently a GPU has acquired programmability to perform general purpose computation fast by running ten thousands of threads concurrently. This paper presents a new algorithm for dense matrix-vector multiplication on NVIDIA CUDA architecture. The experimental results on GeForce 8800GTX show that the proposed algorithm runs maximum 15.69 (resp., 32.88) times faster than the sgemv routine […]
Nov, 5

Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors

This paper presents a graphics processor based implementation of the Finite Difference Time Domain (FDTD), which uses a central finite differencing scheme for solving Maxwell’s equations for electromagnetics. FDTD simulations can be very computationally expensive and require thousands of CPU hours to solve on traditional general purpose processors. Modern Graphics Processing Units (GPUs) found in […]
Nov, 5

Performance Comparison of Graphics Processors to Reconfigurable Logic: A Case Study

A systematic approach to the comparison of the graphics processor (GPU) and reconfigurable logic is defined in terms of three throughput drivers. The approach is applied to five case study algorithms, characterized by their arithmetic complexity, memory access requirements, and data dependence, and two target devices: the nVidia GeForce 7900 GTX GPU and a Xilinx […]
Nov, 5

Exploiting frame-to-frame coherence for accelerating high-quality volume raycasting on graphics hardware

GPU-based raycasting offers an interesting alternative to conventional slice-based volume rendering due to the inherent flexibility and the high quality of the generated images. Recent advances in graphics hardware allow for the ray traversal and volume sampling to be executed on a per-fragment level completely on the GPU leading to interactive framerates. In this work […]
Nov, 5

GPUTeraSort: high performance graphics co-processor sorting for large database management

We present a novel external sorting algorithm using graphics processors (GPUs) on large databases composed of billions of records and wide keys. Our algorithm uses the data parallelism within a GPU along with task parallelism by scheduling some of the memory-intensive and compute-intensive threads on the GPU. Our new sorting architecture provides multiple memory interfaces […]
Nov, 4

2D/3D image registration on the GPU

We present a method that performs a rigid 2D/3D image registration efficiently on the Graphical Processing Unit (GPU). As one main contribution of this paper, we propose an efficient method for generating realistic DRRs that are visually similar to x-ray images. Therefore, we model some of the electronic post-processes of current x-ray C-arm-systems. As another […]
Nov, 4

Treecode and fast multipole method for N-body simulation with CUDA

Due to the variety and importance of applications of treecodes and FMM, the combination of algorithmic acceleration with hardware acceleration can have tremendous impact. Alas, programming these algorithms efficiently is no piece of cake. In this contribution, we aim to present GPU kernels for treecode and FMM in, as much as possible, an uncomplicated, accessible […]
Nov, 4

Simple dynamic LOD for geometry images

We present a new approach for dynamic LOD processing for geometry images (GIs) in the graphics processing unit (GPU). A GI mipmap is constructed from a scanned 3D model, then a mipmap selector map is created from the camera position’s information and the GI-mipmap. Using the mipmap selector map and the current camera position, some […]

* * *

* * *

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: