high performance computing on graphics processing units: hgpu.org

Posts

Nov, 22

A Parallel Algorithm for Error Correction in High-Throughput Short-Read Data on CUDA-Enabled Graphics Hardware

Emerging DNA sequencing technologies open up exciting new opportunities for genome sequencing by generating read data with a massive throughput. However, produced reads are significantly shorter and more error-prone compared to the traditional Sanger shotgun sequencing method. This poses challenges for de novo DNA fragment assembly algorithms in terms of both accuracy (to deal with […]

CUDA

Nov, 22

Inverse scattering and refraction corrected reflection for breast cancer imaging

Reflection ultrasound (US) has been utilized as an adjunct imaging modality for over 30 years. TechniScan, Inc. has developed unique, transmission and concomitant reflection algorithms which are used to reconstruct images from data gathered during a tomographic breast scanning process called Warm Bath Ultrasound (WBU). The transmission algorithm yields high resolution, 3D, attenuation and speed […]

Nov, 22

Interventional 4-D Motion Estimation and Reconstruction of Cardiac Vasculature without Motion Periodicity Assumption

Anatomical and functional information of cardiac vasculature is a key component in the field of interventional cardiology. With the technology of C-arm CT it is possible to reconstruct static intraprocedural 3-D images from angiographic projection data. Current approaches attempt to add the temporal dimension (4-D). In the assumption of periodic heart motion, ECG-gating techniques can […]

CUDA

Nov, 22

A dynamically configurable coprocessor for convolutional neural networks

Convolutional neural networks (CNN) applications range from recognition and reasoning (such as handwriting recognition, facial expression recognition and video surveillance) to intelligent text applications such as semantic text analysis and natural language processing applications. Two key observations drive the design of a new architecture for CNN. First, CNN workloads exhibit a widely varying mix of […]

Nov, 22

A fast stereo matching algorithm suitable for embedded real-time systems

In this paper, the challenge of fast stereo matching for embedded systems is tackled. Limited resources, e.g. memory and processing power, and most importantly real-time capability on embedded systems for robotic applications, do not permit the use of most sophisticated stereo matching approaches. The strengths and weaknesses of different matching approaches have been analyzed and […]

Nov, 22

Optimising GPR modelling: A practical, multi-threaded approach to 3D FDTD numerical modelling

The demand for advanced interpretational tools has lead to the development of highly sophisticated, computationally demanding, 3D GPR processing and modelling techniques. Many of these methods solve very large problems with stepwise methods that utilise numerically similar functions within iterative computational loops. Problems of this nature are readily parallelised by splitting the computational domain into […]

Nov, 22

Fast reduction of undersampling artifacts in radial MR angiography with 3D total variation on graphics hardware

OBJECTIVE: Subsampling of radially encoded MRI acquisitions in combination with sparsity promoting methods opened a door to significantly increased imaging speed, which is crucial for many important clinical applications. In particular, it has been shown recently that total variation (TV) regularization efficiently reduces undersampling artifacts. The drawback of the method is the long reconstruction time […]

CUDA

Nov, 22

Real-time ambient occlusion and halos with summed area tables

Volume models often show high depth complexity. This poses difficulties to the observer in judging the spatial relationships accurately. Illustrators usually use certain techniques such as improving the shading through shadows, halos, or edge darkening in order to enhance depth perception of certain structures. Both effects are difficult to generate in real-time for volumetric models. […]

CUDA

Nov, 22

Reionization Simulations Powered by Graphics Processing Units. I. On the Structure of the Ultraviolet Radiation Field

We present a set of cosmological simulations with radiative transfer in order to model the reionization history of the universe from z = 18 down to z = 6. Galaxy formation and the associated star formation are followed self-consistently with gas and dark matter dynamics using the RAMSES code, while radiative transfer is performed as […]

CUDA

Nov, 22

Scale-dependent and example-based grayscale stippling

We present an example-based approach to synthesizing stipple illustrations for static 2D images that produces scale-dependent results appropriate for an intended spatial output size and resolution. We show how treating stippling as a grayscale process allows us to both produce on-screen output and to achieve stipple merging at medium tonal ranges. At the same time […]

Nov, 22

Field modelling acceleration on ultrasonic systems using graphic hardware

Field modelling is a common practice in the area of ultrasonic non-destructive evaluation (NDE) because it is a useful tool for assessing NDE imaging. However, it is a very time consuming task because of its complexity and data volume, making difficult its use in systems demanding real time responses. Recently, graphics processing units (GPUs) have […]

Nov, 22

Dense photometric stereo reconstruction on many core GPUs

Photometric stereo algorithms are used in many applications for the 3D reconstruction of scenes from a number of 2D images, illuminated by calibrated light sources of different directions. However, the widely used assumption that the direction of the light remains constant across all pixels of the image usually induces reconstruction errors. We propose here a […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A Parallel Algorithm for Error Correction in High-Throughput Short-Read Data on CUDA-Enabled Graphics Hardware

Inverse scattering and refraction corrected reflection for breast cancer imaging

Interventional 4-D Motion Estimation and Reconstruction of Cardiac Vasculature without Motion Periodicity Assumption

A dynamically configurable coprocessor for convolutional neural networks

A fast stereo matching algorithm suitable for embedded real-time systems

Optimising GPR modelling: A practical, multi-threaded approach to 3D FDTD numerical modelling

Fast reduction of undersampling artifacts in radial MR angiography with 3D total variation on graphics hardware

Real-time ambient occlusion and halos with summed area tables

Reionization Simulations Powered by Graphics Processing Units. I. On the Structure of the Ultraviolet Radiation Field

Scale-dependent and example-based grayscale stippling

Field modelling acceleration on ultrasonic systems using graphic hardware

Dense photometric stereo reconstruction on many core GPUs

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)