high performance computing on graphics processing units: hgpu.org

Posts

Nov, 17

A Survey of Medical Image Registration on Multicore and the GPU

In this article, we look at early, recent, and state-of-the-art methods for registration of medical images using a range of high-performance computing (HPC) architectures including symmetric multiprocessing (SMP), massively multiprocessing (MMP), and architectures with distributed memory (DM), and nonuniform memory access (NUMA). The article is designed to be self-sufficient. We will take the time to […]

Nov, 17

TeraFLOP computing on a desktop PC with GPUs for 3D CFD

A very efficient implementation of a lattice Boltzmann (LB) kernel in 3D on a graphical processing unit using the compute unified device architecture interface developed by nVIDIA is presented. By exploiting the explicit parallelism offered by the graphics hardware, we obtain an efficiency gain of up to two orders of magnitude with respect to the […]

Nov, 17

FAST: fast architecture sensitive tree search on modern CPUs and GPUs

In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous computing power by integrating multiple cores, each with wide vector units. There has been much work to exploit modern processor architectures for database primitives like scan, sort, join and aggregation. However, unlike other primitives, tree search presents significant challenges due to […]

Nov, 17

Scalable parallel programming with CUDA

Is CUDA the parallel programming model that application developers have been waiting for?

CUDA

Nov, 17

Fast free-form deformation using graphics processing units

A large number of algorithms have been developed to perform non-rigid registration and it is a tool commonly used in medical image analysis. The free-form deformation algorithm is a well-established technique, but is extremely time consuming. In this paper we present a parallel-friendly formulation of the algorithm suitable for graphics processing unit execution. Using our […]

CUDA

Nov, 17

A Graphics Parallel Memory Organization Exploiting Request Correlations

Real-time graphics applications require memory organizations featuring parallel pixel access and low-cost implementation. This work bases on a nonlinear skew mapping scheme and exploits the correlation between consecutive requests for pixels to design an efficient parallel memory organization. The mapping achieves parallel access, of mn pixels in various shapes, to the memory organized with mn […]

Nov, 17

permGPU: Using graphics processing units in RNA microarray association studies

BACKGROUND:Many analyses of microarray association studies involve permutation and bootstrap resampling, and cross-validation, that are ideally formulated as embarrassingly parallel computing problems. Given that these analyses are computationally intensive, scalable approaches that can take advantage of multi-core processor systems need to be developed. RESULTS:We have developed a CUDA based implementation, permGPU, that employs graphics processing […]

CUDA

Nov, 17

Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA Compatible Devices

In this paper, we propose an acceleration of collapsed variational Bayesian (CVB) inference for latent Dirichlet allocation (LDA) by using Nvidia CUDA compatible devices. While LDA is an efficient Bayesian multi-topic document model, it requires complicated computations for parameter estimation in comparison with other simpler document models, e.g. probabilistic latent semantic indexing, etc. Therefore, we […]

CUDA

Nov, 17

Accelerating simultaneous algebraic reconstruction technique with motion compensation using CUDA-enabled GPU

PURPOSE: To accelerate the simultaneous algebraic reconstruction technique (SART) with motion compensation for speedy and quality computed tomography reconstruction by exploiting CUDA-enabled GPU. METHODS: Two core techniques are proposed to fit SART into the CUDA architecture: (1) a ray-driven projection along with hardware trilinear interpolation, and (2) a voxel-driven back-projection that can avoid redundant computation […]

CUDA

Nov, 17

Eye-Full Tower: A GPU-based variable multibaseline omnidirectional stereovision system with automatic baseline selection for outdoor mobile robot navigation

In recent years, it can be observed that there is a gradual increase in the number of researchers and projects involved with the development of omnidirectional vision systems for various applications. The primary factors, which contributed towards this positive trend, are the availability of inexpensive and high resolution vision sensors, robust and fast computers and […]

CUDA

Nov, 17

SHEsisEpi, a GPU-enhanced genome-wide SNP-SNP interaction scanning algorithm, efficiently reveals the risk of genetic epistasis in bipolar disorder

We developed a GPU-based analytical method, named as SHEsisEpi, which purely focuses on risk epistasis in a genome-wide association study (GWAS) of complex traits, excluding the contamination of marginal effects caused by single-locus association. We analyzed the Wellcome Trust Case Control Consortium’s (WTCCC) GWAS data of bipolar disorder (BPD) with 500K SNPs.

Nov, 17

Alignator: A GPU powered software package for robust fiducial-less alignment of cryo tilt-series

The robust alignment of tilt-series collected for cryo-electron tomography in the absence of fiducial markers, is a problem that, especially for tilt-series of vitreous sections, still represents a significant challenge. Here we present a complete software package that implements a cross-correlation based procedure that tracks similar image features that are present in several micrographs and […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A Survey of Medical Image Registration on Multicore and the GPU

TeraFLOP computing on a desktop PC with GPUs for 3D CFD

FAST: fast architecture sensitive tree search on modern CPUs and GPUs

Scalable parallel programming with CUDA

Fast free-form deformation using graphics processing units

A Graphics Parallel Memory Organization Exploiting Request Correlations

permGPU: Using graphics processing units in RNA microarray association studies

Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA Compatible Devices

Accelerating simultaneous algebraic reconstruction technique with motion compensation using CUDA-enabled GPU

Eye-Full Tower: A GPU-based variable multibaseline omnidirectional stereovision system with automatic baseline selection for outdoor mobile robot navigation

SHEsisEpi, a GPU-enhanced genome-wide SNP-SNP interaction scanning algorithm, efficiently reveals the risk of genetic epistasis in bipolar disorder

Alignator: A GPU powered software package for robust fiducial-less alignment of cryo tilt-series

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)