Posts
Dec, 12
Map-reduce as a Programming Model for Custom Computing Machines
The map-reduce model requires users to express their problem in terms of a map function that processes single records in a stream, and a reduce function that merges all mapped outputs to produce a final result. By exposing structural similarity in this way, a number of key issues associated with the design of custom computing […]
Dec, 12
A decompression pipeline for accelerating out-of-core volume rendering of time-varying data
This paper presents a decompression pipeline capable of accelerating out-of-core volume rendering of time-varying scalar data. Our pipeline is based on a two-stage compression method that cooperatively uses the CPU and the graphics processing unit (GPU) to transfer compressed data entirely from the storage device to the video memory. This method combines two different compression […]
Dec, 12
Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy
By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. These ideas can be applied to sparse multifrontal and supernodal direct techniques and sparse iterative techniques such as Krylov subspace methods. The approach […]
Dec, 12
The visible ear surgery simulator
This paper presents a real-time computer simulation of surgical procedures in the ear, in which a surgeon drills into the temporal bone to gain access to the middle or inner ear. The purpose of this simulator is to support development of anatomical insight and training of drilling skills for both medical students and experienced otologists. […]
Dec, 12
Parallel algorithms for approximation of distance maps on parametric surfaces
We present an efficient O( n ) numerical algorithm for first-order approximation of geodesic distances on geometry images, where n is the number of points on the surface. The structure of our algorithm allows efficient implementation on parallel architectures. Two implementations on a SIMD processor and on a GPU are discussed. Numerical results demonstrate up […]
Dec, 12
Stream Processing of Integral Images for Real-Time Object Detection
This paper presents the design and evaluation of the stream processing implementation of the Integral Image algorithm. The Integral Image is a key component of many image processing algorithms in particular the Haar-like feature based systems. Modern GPUs provide a large number of processors with a peak floating point performance that is significantly higher than […]
Dec, 12
Real-time digital holographic microscopy using the graphic processing unit
Digital holographic microscopy (DHM) is a well-known powerful method allowing both the amplitude and phase of a specimen to be simultaneously observed. In order to obtain a reconstructed image from a hologram, numerous calculations for the Fresnel diffraction are required. The Fresnel diffraction can be accelerated by the FFT (Fast Fourier Transform) algorithm. However, real-time […]
Dec, 12
A compiler framework for optimization of affine loop nests for gpgpus
GPUs are a class of specialized parallel architectures with tremendous computational power. The new Compute Unified Device Architecture (CUDA) programming model from NVIDIA facilitates programming of general purpose applications on their GPUs. However, manual development of high-performance parallel code for GPUs is still very challenging. In this paper, a number of issues are addressed towards […]
Dec, 12
Two-electron integral evaluation on the graphics processor unit
We propose the algorithm to evaluate the Coulomb potential in the ab initio density functional calculation on the graphics processor unit (GPU). The numerical accuracy required for the algorithm is investigated in detail. It is shown that GPU, which supports only the single-precision floating number natively, can take part in the major computational tasks. Because […]
Dec, 12
Deformable model collision detection using A-buffer
This paper presents a new image-space algorithm for real-time collision detection, where the GPU computes the potentially colliding sets (PCSs), and the CPU performs the standard triangle/triangle intersection test. When the bounding boxes of two objects intersect, the intersection is passed to the GPU. By rendering the objects in the intersection region, the GPU saves […]
Dec, 12
Data parallel execution challenges and runtime performance of agent simulations on GPUs
Programmable graphics processing units (GPUs) have emerged as excellent computational platforms for certain general-purpose applications. The data parallel execution capabilities of GPUs specifically point to the potential for effective use in simulations of agent-based models (ABM). In this paper, the computational efficiency of ABM simulation on GPUs is evaluated on representative ABM benchmarks. The runtime […]
Dec, 12
A Fast Similarity Join Algorithm Using Graphics Processing Units
A similarity join operation A BOWTIE_epsiv B takes two sets of points A, B and a value epsiv isin Ropf, and outputs pairs of points p in A,q in B, such that the distance D(p,q) < epsiv. Similarity joins find use in a variety of fields, such as clustering, text mining, and multimedia databases. A […]