high performance computing on graphics processing units: hgpu.org

Posts

Mar, 22

Bridging the GPGPU-FPGA efficiency gap

This paper compares an implementation of a Bayesian inference algorithm across several FPGAs and GPGPUs, while embracing both the execution model and high-level architecture of a GPGPU. Our study is motivated by recent work in template-based programming and architectural models for FPGA computing. The comparison we present is meant to demonstrate the FPGA’s potential, while […]

OpenCL

Mar, 22

Improving accuracy for matrix multiplications on GPUs

Reproducibility of an experiment is a commonly used metric to determine its validity. Within scientific computing, this can become difficult due to the accumulation of floating point rounding errors in the numerical computation, greatly reducing the accuracy of the computation. Matrix multiplication is particularly susceptible to these rounding errors which is why there exist so […]

CUDA

Mar, 22

Evaluating force field accuracy with long-time simulations of a beta-hairpin tryptophan zipper peptide

We have combined graphics processing unit-accelerated all-atom molecular dynamics with parallel tempering to explore the folding properties of small peptides in implicit solvent on the time scale of microseconds. We applied this methodology to the synthetic beta-hairpin, trpzip2, and one of its sequence variants, W2W9. Each simulation consisted of over 8 ms of aggregated virtual […]

CUDA

Mar, 22

Constructing Two-Dimensional Voronoi Diagrams via Divide-and-Conquer of Envelopes in Space (thesis)

We present a general framework for computing two-dimensional Voronoi diagrams of different classes of sites under various distance functions. The framework is sufficiently general to support diagrams embedded on a family of two-dimensional parametric surfaces in $R^3$. The computation of the diagrams is carried out through the construction of envelopes of surfaces in 3-space provided […]

Mar, 22

Constructing Two-Dimensional Voronoi Diagrams via Divide-and-Conquer of Envelopes in Space

We present a general framework for computing Voronoi diagrams of different classes of sites under various distance functions in $R^3$. Most diagrams mentioned in the paper are in the plane. However, the framework is sufficiently general to support diagrams embedded on a family of two-dimensional parametric surfaces in three-dimensions. The computation of the diagrams is […]

Mar, 22

fastHOG – a real-time GPU implementation of HOG

We introduce a parallel implementation of the histogram of oriented gradients algorithm for object detection. Our implementation uses the GPU and the NVIDIA CUDA framework. We achieve speedups of over 67x from the standard sequential code, using a single video card. Furthermore it supports multiple video cards so speedups of 120x or more can be […]

CUDA

Mar, 22

Hierarchical belief propagation to reduce search space using CUDA for stereo and motion estimation

This paper describes a hierarchical belief propagation implementation in which a ‘rough’ disparity map calculation or motion estimation in higher levels is used to limit the search space and enable the calculation of the desired disparity map/set of motion vectors using a smaller search space than traditional belief propagation. We implement our algorithm on the […]

CUDA

Mar, 22

GPU implementation of belief propagation using CUDA for cloud tracking and reconstruction

This paper describes an efficient CUDA-based GPU implementation of the belief propagation algorithm that can be used to speed up stereo image processing and motion tracking calculations without loss of accuracy. Preliminary results in using belief propagation to analyze satellite images of hurricane Luis for real-time cloud structure and tracking are promising with speed-ups of […]

CUDA

Mar, 21

GPU implemention of fast Gabor filters

With their parallel multi-core architecture, Programmable Graphics Processing Units (GPUs) are well suited for implementing biologically-inspired visual processing algorithms, such as Gabor filtering. We compare several GPU implementations of Gabor filtering. On the same graphics card (an NVIDIA GeForce 9800 GTX+) and for convolution kernel radii from 8 to 48 pixels, an algorithm that decomposes […]

Mar, 21

GPU-based password cracking

In this research the following question is answered: what should KPMG advice their clients regarding to password length and complexity, now GPU-based password cracking has become a reality. To be able to answer this question, tests with different tools and hashes were performed on a system with four high end GPUs. The test system showed […]

CUDA

Mar, 21

GPU Accelerated VLSI Design Verification

Today’s Very Large Scale Integrated-Circuit (VLSI) designs require intensive verification effort. However, traditional sequential verification solutions could no longer provide the scalability for future large designs. The so-called verification gap hinders the development of future VLSI products. In this paper, we review our recent works on accelerating typical VLSI verification tasks with modern GPUs. Our […]

Mar, 21

An evaluation of GPU acceleration for sparse reconstruction

Image processing applications typically parallelize well. This gives a developer interested in data throughput several different implementation options, including multiprocessor machines, general purpose computation on the graphics processor, and custom gate-array designs. Herein, we will investigate these first two options for dictionary learning and sparse reconstruction, specifically focusing on the K-SVD algorithm for dictionary learning […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Bridging the GPGPU-FPGA efficiency gap

Improving accuracy for matrix multiplications on GPUs

Evaluating force field accuracy with long-time simulations of a beta-hairpin tryptophan zipper peptide

Constructing Two-Dimensional Voronoi Diagrams via Divide-and-Conquer of Envelopes in Space (thesis)

Constructing Two-Dimensional Voronoi Diagrams via Divide-and-Conquer of Envelopes in Space

fastHOG – a real-time GPU implementation of HOG

Hierarchical belief propagation to reduce search space using CUDA for stereo and motion estimation

GPU implementation of belief propagation using CUDA for cloud tracking and reconstruction

GPU implemention of fast Gabor filters

GPU-based password cracking

GPU Accelerated VLSI Design Verification

An evaluation of GPU acceleration for sparse reconstruction

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)