high performance computing on graphics processing units: hgpu.org

Posts

Nov, 27

Solving Sparse Linear Systems on NVIDIA Tesla GPUs

Current many-core GPUs have enormous processing power, and unlocking this power for general-purpose computing is very attractive due to their low cost and efficient power utilization. However, the fine-grained parallelism and the stream-programming model supported by these GPUs require a paradigm shift, especially for algorithm designers. In this paper we present the design of a […]

CUDA

Nov, 27

The Virtual Marathon: Parallel Computing Supports Crowd Simulations

To be realistic, an urban model must include appropriate numbers of pedestrians, vehicles, and other dynamic entities. Using a parallel computing architecture, researchers simulated a marathon with more than a million participants. To simulate participant behavior, they used fuzzy logic on a GPU to perform millions of inferences in real time.

OpenGL

Nov, 27

Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition

In large vocabulary continuous speech recognition (LVCSR) the acoustic model computations often account for the largest processing overhead. Our weighted finite state transducer (WFST) based decoding engine can utilize a commodity graphics processing unit (GPU) to perform the acoustic computations to move this burden off the main processor. In this paper we describe our new […]

CUDA

Nov, 27

Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study

The tradeoffs of accuracy and performance are as yet an unsolved problem when dealing with Graphics Processing Units (GPUs) as a general-purpose computation device. Their high performance and low cost makes them a desirable target for scientific computation, and new language efforts help address the programming challenges of data parallel algorithms and memory management. But […]

Nov, 26

Evaluating the use of GPUs in liver image segmentation and HMMER database searches

In this paper we present the results of parallelizing two life sciences applications, Markov random fields-based (MRF) liver segmentation and HMMER’s Viterbi algorithm, using GPUs. We relate our experiences in porting both applications to the GPU as well as the techniques and optimizations that are most beneficial. The unique characteristics of both algorithms are demonstrated […]

CUDA

Nov, 26

Triangular matrix inversion on Graphics Processing Unit

Dense matrix inversion is a basic procedure in many linear algebra algorithms. A computationally arduous step in most dense matrix inversion methods is the inversion of triangular matrices as produced by factorization methods such as LU decomposition. In this paper, we demonstrate how triangular matrix inversion (TMI) can be accelerated considerably by using commercial Graphics […]

CUDA

Nov, 26

Graphic processing unit-accelerated mutual information-based 3D image rigid registration

Mutual information (MI)-based image registration is effective in registering medical images, but it is computationally expensive. This paper accelerates MI-based image registration by dividing computation of mutual information into spatial transformation and histogram-based calculation, and performing 3D spatial transformation and trilinear interpolation on graphic processing unit (GPU). The 3D floating image is downloaded to GPU […]

Nov, 26

Fast Disk Encryption through GPGPU Acceleration

We present the design and performance analysis of a GPU-optimized implementation of a disk encryption application employing the XTS mode of operation applied together with the Twofish algorithm within the well-known TrueCrypt suite. We show how to correctly tune the design parameters, including data allocation, thread packing, and parallelization strategy. Overall, our implementation of TrueCrypt […]

CUDA

Nov, 26

CFD-based analysis and two-level aerodynamic optimization on Graphics Processing Units

This paper presents the porting of 2D and 3D Navier-Stokes equations solvers for unstructured grids, from the CPU to the Graphics Processing Unit (GPU; NVIDIA’s Ge-Force GTX 280 and 285), using the CUDA language. The performance of the GPU implementations, with single, double or mixed precision arithmetic operations, is compared to that of the CPU […]

CUDA

Nov, 26

Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors

The availability of easily programmable manycore CPUs and GPUs has motivated investigations into how to best exploit their tremendous computational power for scientific computing. Here we demonstrate how a systems biology application – detection and tracking of white blood cells in video microscopy – can be accelerated by 200times using a CUDA-capable GPU. Because the […]

CUDA

Nov, 26

Profile-guided optimization of critical medical imaging algorithms

Given the rapid growth in computational requirements for medical image analysis, Graphics Processing Units (GPUs) have begun to be utilized to address these demands. But even though GPUs are well-suited to the underlying processing associated with medical image reconstruction, extracting the full benefits of moving to GPU platforms requires significant programming effort, and presents a […]

Nov, 26

A GPU framework for parallel segmentation of volumetric images using discrete deformable models

Despite the ability of current GPU processors to treat heavy parallel computation tasks, its use for solving medical image segmentation problems is still not fully exploited and remains challenging. A lot of difficulties may arise related to, for example, the different image modalities, noise and artifacts of source images, or the shape and appearance variability […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Solving Sparse Linear Systems on NVIDIA Tesla GPUs

The Virtual Marathon: Parallel Computing Supports Crowd Simulations

Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition

Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study

Evaluating the use of GPUs in liver image segmentation and HMMER database searches

Triangular matrix inversion on Graphics Processing Unit

Graphic processing unit-accelerated mutual information-based 3D image rigid registration

Fast Disk Encryption through GPGPU Acceleration

CFD-based analysis and two-level aerodynamic optimization on Graphics Processing Units

Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors

Profile-guided optimization of critical medical imaging algorithms

A GPU framework for parallel segmentation of volumetric images using discrete deformable models

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)