Posts
Nov, 5
Lattice SU(2) on GPU’s
We discuss the CUDA approach to the simulation of pure gauge Lattice SU(2). CUDA is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU with single precision. Analysis with single and multiple GPU’s, using CUDA and OPENMP, are also […]
Nov, 5
ECM on Graphics Cards
This paper reports record-setting performance for the elliptic-curve method of integer factorization: for example, 926.11 curves/second for ECM stage 1 with B1=8192 for 280-bit integers on a single PC. The state-of-the-art GMP-ECM software handles 124.71 curves/second for ECM stage 1 with B1=8192 for 280-bit integers using all four cores of a 2.4 GHz Core 2 Quad […]
Nov, 5
Realistic real-time sound re-synthesis and processing for interactive virtual worlds
We present new GPU-based techniques for implementing linear digital filters for real-time audio processing. Our solution for recursive filters is the first presented in the literature. We demonstrate the relevance of these algorithms to computer graphics by synthesizing realistic sounds of colliding objects made of different materials, such as glass, plastic, and wood, in real […]
Nov, 5
Solving Path Problems on the GPU
We consider the computation of shortest paths on Graphic Processing Units (GPUs). The blocked recursive elimination strategy we use is applicable to a class of algorithms (such as all-pairs shortest-paths, transitive closure, and LU decomposition without pivoting) having similar data access patterns. Using the all-pairs shortest-paths problem as an example, we uncover potential gains over […]
Nov, 5
Parallel search on video cards
Recent approaches exploiting the massively parallel architecture of graphics processors (GPUs) to accelerate database operations have achieved intriguing results. While parallel sorting received significant attention, parallel search has not been explored. With p-ary search we present a novel parallel search algorithm for large-scale database index operations that scales with the number of processors and outperforms […]
Nov, 5
A Performance Comparison of CUDA and OpenCL
CUDA and OpenCL offer two different interfaces for programming GPUs. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs. Although OpenCL promises a portable language for GPU programming, its generality may entail a performance penalty. In this paper, […]
Nov, 5
Faster matrix-vector multiplication on GeForce 8800GTX
Recently a GPU has acquired programmability to perform general purpose computation fast by running ten thousands of threads concurrently. This paper presents a new algorithm for dense matrix-vector multiplication on NVIDIA CUDA architecture. The experimental results on GeForce 8800GTX show that the proposed algorithm runs maximum 15.69 (resp., 32.88) times faster than the sgemv routine […]
Nov, 5
Acceleration of direct volume rendering with programmable graphics hardware
We propose a method to accelerate direct volume rendering using programmable graphics hardware (GPU). In the method, texture slices are grouped together to form a texture slab. Rendering non-empty slabs from front to back viewing order generates the resultant image. Considering each pixel of the image as a ray, slab silhouette maps (SSMs) are used […]
Nov, 5
Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors
This paper presents a graphics processor based implementation of the Finite Difference Time Domain (FDTD), which uses a central finite differencing scheme for solving Maxwell’s equations for electromagnetics. FDTD simulations can be very computationally expensive and require thousands of CPU hours to solve on traditional general purpose processors. Modern Graphics Processing Units (GPUs) found in […]
Nov, 5
Performance Comparison of Graphics Processors to Reconfigurable Logic: A Case Study
A systematic approach to the comparison of the graphics processor (GPU) and reconfigurable logic is defined in terms of three throughput drivers. The approach is applied to five case study algorithms, characterized by their arithmetic complexity, memory access requirements, and data dependence, and two target devices: the nVidia GeForce 7900 GTX GPU and a Xilinx […]
Nov, 5
Exploiting frame-to-frame coherence for accelerating high-quality volume raycasting on graphics hardware
GPU-based raycasting offers an interesting alternative to conventional slice-based volume rendering due to the inherent flexibility and the high quality of the generated images. Recent advances in graphics hardware allow for the ray traversal and volume sampling to be executed on a per-fragment level completely on the GPU leading to interactive framerates. In this work […]
Nov, 5
GPUTeraSort: high performance graphics co-processor sorting for large database management
We present a novel external sorting algorithm using graphics processors (GPUs) on large databases composed of billions of records and wide keys. Our algorithm uses the data parallelism within a GPU along with task parallelism by scheduling some of the memory-intensive and compute-intensive threads on the GPU. Our new sorting architecture provides multiple memory interfaces […]