high performance computing on graphics processing units: hgpu.org

Posts

Dec, 12

Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy

By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. These ideas can be applied to sparse multifrontal and supernodal direct techniques and sparse iterative techniques such as Krylov subspace methods. The approach […]

Dec, 12

The visible ear surgery simulator

This paper presents a real-time computer simulation of surgical procedures in the ear, in which a surgeon drills into the temporal bone to gain access to the middle or inner ear. The purpose of this simulator is to support development of anatomical insight and training of drilling skills for both medical students and experienced otologists. […]

OpenGL

Dec, 12

Parallel algorithms for approximation of distance maps on parametric surfaces

We present an efficient O( n ) numerical algorithm for first-order approximation of geodesic distances on geometry images, where n is the number of points on the surface. The structure of our algorithm allows efficient implementation on parallel architectures. Two implementations on a SIMD processor and on a GPU are discussed. Numerical results demonstrate up […]

Dec, 12

Stream Processing of Integral Images for Real-Time Object Detection

This paper presents the design and evaluation of the stream processing implementation of the Integral Image algorithm. The Integral Image is a key component of many image processing algorithms in particular the Haar-like feature based systems. Modern GPUs provide a large number of processors with a peak floating point performance that is significantly higher than […]

Dec, 12

Real-time digital holographic microscopy using the graphic processing unit

Digital holographic microscopy (DHM) is a well-known powerful method allowing both the amplitude and phase of a specimen to be simultaneously observed. In order to obtain a reconstructed image from a hologram, numerous calculations for the Fresnel diffraction are required. The Fresnel diffraction can be accelerated by the FFT (Fast Fourier Transform) algorithm. However, real-time […]

CUDA

Dec, 12

A compiler framework for optimization of affine loop nests for gpgpus

GPUs are a class of specialized parallel architectures with tremendous computational power. The new Compute Unified Device Architecture (CUDA) programming model from NVIDIA facilitates programming of general purpose applications on their GPUs. However, manual development of high-performance parallel code for GPUs is still very challenging. In this paper, a number of issues are addressed towards […]

CUDA

Dec, 12

Two-electron integral evaluation on the graphics processor unit

We propose the algorithm to evaluate the Coulomb potential in the ab initio density functional calculation on the graphics processor unit (GPU). The numerical accuracy required for the algorithm is investigated in detail. It is shown that GPU, which supports only the single-precision floating number natively, can take part in the major computational tasks. Because […]

CUDA

Dec, 12

Deformable model collision detection using A-buffer

This paper presents a new image-space algorithm for real-time collision detection, where the GPU computes the potentially colliding sets (PCSs), and the CPU performs the standard triangle/triangle intersection test. When the bounding boxes of two objects intersect, the intersection is passed to the GPU. By rendering the objects in the intersection region, the GPU saves […]

Dec, 12

Data parallel execution challenges and runtime performance of agent simulations on GPUs

Programmable graphics processing units (GPUs) have emerged as excellent computational platforms for certain general-purpose applications. The data parallel execution capabilities of GPUs specifically point to the potential for effective use in simulations of agent-based models (ABM). In this paper, the computational efficiency of ABM simulation on GPUs is evaluated on representative ABM benchmarks. The runtime […]

Dec, 12

A Fast Similarity Join Algorithm Using Graphics Processing Units

A similarity join operation A BOWTIE_epsiv B takes two sets of points A, B and a value epsiv isin Ropf, and outputs pairs of points p in A,q in B, such that the distance D(p,q) < epsiv. Similarity joins find use in a variety of fields, such as clustering, text mining, and multimedia databases. A […]

CUDA

Dec, 12

A map reduce framework for programming graphics processors

Recent developments in programmable, highly parallel Graphics Processing Units (GPUs) have enabled high performance general purpose computation. We describe a framework designed for high performance GPU programming, built on Nvidia’s Compute Unified Device Architecture (CUDA) platform. The framework is built around the Map Reduce abstraction, which allows application developers to focus on their application, while […]

CUDA

Dec, 12

CUDA: Scalable parallel programming for high-performance scientific computing

Graphics processing units (GPUs) originally designed for computer video cards have emerged as the most powerful chip in a high-performance workstation. Unlike multicore CPU architectures, which currently ship with two or four cores, GPU architectures are “manycore” with hundreds of cores capable of running thousands of threads in parallel. NVIDIA’s CUDA is a co-evolved hardware-software […]

CUDA