high performance computing on graphics processing units: hgpu.org

Posts

Dec, 12

Map-reduce as a Programming Model for Custom Computing Machines

The map-reduce model requires users to express their problem in terms of a map function that processes single records in a stream, and a reduce function that merges all mapped outputs to produce a final result. By exposing structural similarity in this way, a number of key issues associated with the design of custom computing […]

CUDA

Dec, 12

A decompression pipeline for accelerating out-of-core volume rendering of time-varying data

This paper presents a decompression pipeline capable of accelerating out-of-core volume rendering of time-varying scalar data. Our pipeline is based on a two-stage compression method that cooperatively uses the CPU and the graphics processing unit (GPU) to transfer compressed data entirely from the storage device to the video memory. This method combines two different compression […]

OpenGL

Dec, 12

Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy

By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. These ideas can be applied to sparse multifrontal and supernodal direct techniques and sparse iterative techniques such as Krylov subspace methods. The approach […]

Dec, 12

The visible ear surgery simulator

This paper presents a real-time computer simulation of surgical procedures in the ear, in which a surgeon drills into the temporal bone to gain access to the middle or inner ear. The purpose of this simulator is to support development of anatomical insight and training of drilling skills for both medical students and experienced otologists. […]

OpenGL

Dec, 12

Parallel algorithms for approximation of distance maps on parametric surfaces

We present an efficient O( n ) numerical algorithm for first-order approximation of geodesic distances on geometry images, where n is the number of points on the surface. The structure of our algorithm allows efficient implementation on parallel architectures. Two implementations on a SIMD processor and on a GPU are discussed. Numerical results demonstrate up […]

Dec, 12

Stream Processing of Integral Images for Real-Time Object Detection

This paper presents the design and evaluation of the stream processing implementation of the Integral Image algorithm. The Integral Image is a key component of many image processing algorithms in particular the Haar-like feature based systems. Modern GPUs provide a large number of processors with a peak floating point performance that is significantly higher than […]

Dec, 12

Real-time digital holographic microscopy using the graphic processing unit

Digital holographic microscopy (DHM) is a well-known powerful method allowing both the amplitude and phase of a specimen to be simultaneously observed. In order to obtain a reconstructed image from a hologram, numerous calculations for the Fresnel diffraction are required. The Fresnel diffraction can be accelerated by the FFT (Fast Fourier Transform) algorithm. However, real-time […]

CUDA

Dec, 12

A compiler framework for optimization of affine loop nests for gpgpus

GPUs are a class of specialized parallel architectures with tremendous computational power. The new Compute Unified Device Architecture (CUDA) programming model from NVIDIA facilitates programming of general purpose applications on their GPUs. However, manual development of high-performance parallel code for GPUs is still very challenging. In this paper, a number of issues are addressed towards […]

CUDA

Dec, 12

Two-electron integral evaluation on the graphics processor unit

We propose the algorithm to evaluate the Coulomb potential in the ab initio density functional calculation on the graphics processor unit (GPU). The numerical accuracy required for the algorithm is investigated in detail. It is shown that GPU, which supports only the single-precision floating number natively, can take part in the major computational tasks. Because […]

CUDA

Dec, 12

Deformable model collision detection using A-buffer

This paper presents a new image-space algorithm for real-time collision detection, where the GPU computes the potentially colliding sets (PCSs), and the CPU performs the standard triangle/triangle intersection test. When the bounding boxes of two objects intersect, the intersection is passed to the GPU. By rendering the objects in the intersection region, the GPU saves […]

Dec, 12

Data parallel execution challenges and runtime performance of agent simulations on GPUs

Programmable graphics processing units (GPUs) have emerged as excellent computational platforms for certain general-purpose applications. The data parallel execution capabilities of GPUs specifically point to the potential for effective use in simulations of agent-based models (ABM). In this paper, the computational efficiency of ABM simulation on GPUs is evaluated on representative ABM benchmarks. The runtime […]

Dec, 12

A Fast Similarity Join Algorithm Using Graphics Processing Units

A similarity join operation A BOWTIE_epsiv B takes two sets of points A, B and a value epsiv isin Ropf, and outputs pairs of points p in A,q in B, such that the distance D(p,q) < epsiv. Similarity joins find use in a variety of fields, such as clustering, text mining, and multimedia databases. A […]

CUDA

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Posts

Map-reduce as a Programming Model for Custom Computing Machines

A decompression pipeline for accelerating out-of-core volume rendering of time-varying data

Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy

The visible ear surgery simulator

Parallel algorithms for approximation of distance maps on parametric surfaces

Stream Processing of Integral Images for Real-Time Object Detection

Real-time digital holographic microscopy using the graphic processing unit

A compiler framework for optimization of affine loop nests for gpgpus

Two-electron integral evaluation on the graphics processor unit

Deformable model collision detection using A-buffer

Data parallel execution challenges and runtime performance of agent simulations on GPUs

A Fast Similarity Join Algorithm Using Graphics Processing Units

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Most viewed papers (last 30 days)