high performance computing on graphics processing units: hgpu.org

Posts

Nov, 7

GPU-accelerated Convex Multi-phase Image Segmentation

Image segmentation is a key area of research in computer vision. Recent advances facilitated reformulation of the non-convex multi-phase segmentation problem as a convex optimization problem (see for example [2, 4, 9, 10, 13, 16]). Recently, [3] proposed a new convex relaxation approach for a class of vector-valued minimization problems, and this approach is directly […]

CUDA

Nov, 6

Real-time Sliding Phase Vocoder using a Commodity GPU

We describe a new approach to the processing of audio by way of transformations to and from the frequency domain. In previous papers we described the Sliding Discrete Fourier Transform (SDFT), comprising an extension to the classic phase vocoder algorithm to perform a frame update every sample. We proposed this as offering musical advantages over […]

CUDA

Nov, 6

Parallelization of algorithms for solving the Boltzmann equation for GPU-based computations

The paper describes specific features of parallelization of collision integral computation algorithms, which are conditioned by the CUDA architecture of parallelization on graphic cards [1].

CUDA

Nov, 6

The conjugate gradient solver accelerated by GPU for solving wave-propagation problems

There are several possibilities to speed-up an iterative solver, e.g. by applying an efficient preconditioner to decrease the number of required iterations, or by parallelizing the given algorithm, etc. To acquire maximum performance from a massively parallelized environment, different parts of such a solver must be asynchronously parallelized to avoid expensive cooperation between threads. The […]

CUDA

Nov, 6

Advanced MRI reconstruction toolbox with accelerating on GPU

In this paper, we present a fast iterative magnetic resonance imaging (MRI) reconstruction algorithm taking advantage of the prevailing GPGPU programming paradigm. In clinical environment, MRI reconstruction is usually performed via fast Fourier transform (FFT). However, imaging artifacts (i.e. signal loss) resulting from susceptibility-induced magnetic field inhomogeneities degrade the quality of reconstructed images. These artifacts […]

CUDA

Nov, 6

Global memory access modelling for efficient implementation of the lattice Boltzmann method on graphics processing units

In this work, we investigate the global memory access mechanism on recent GPUs. For the purpose of this study, we created specific benchmark programs, which allowed us to explore the scheduling of global memory transactions. Thus, we formulate a model capable of estimating the execution time for a large class of applications. Our main goal […]

CUDA

Nov, 6

Performance and scalability of Fourier domain optical coherence tomography acceleration using graphics processing units

Fourier domain optical coherence tomography (FD-OCT) provides faster line rates, better resolution, and higher sensitivity for noninvasive, in vivo biomedical imaging compared to traditional time domain OCT (TD-OCT). However, because the signal processing for FD-OCT is computationally intensive, real-time FD-OCT applications demand powerful computing platforms to deliver acceptable performance. Graphics processing units (GPUs) have been […]

CUDA

Nov, 6

Running unstructured grid-based CFD solvers on modern graphics hardware

Techniques used to implement an unstructured grid solver on modern graphics hardware are described. The three-dimensional Euler equations for inviscid, compressible flow are considered. Effective memory bandwidth is improved by reducing total global memory access and overlapping redundant computation, as well as using an appropriate numbering scheme and data layout. The applicability of per-block shared […]

CUDA

Nov, 5

Fast QAP Solver with ACO and Taboo Search on GPU using Move-Cost Adjusted Thread Assignment

There are several studies on solving the quadratic assignment problem (QAP) withGPUs using an evolutionary computation. In our previous studies [3], we applied GPU computation to solve quadratic assignment problems (QAPs) using a distributed parallel GA model on GPUs. However, in those studies no local searches were applied. In this QAP solver, we implemented a […]

CUDA

Nov, 5

Acceleration of Deterministic Boltzmann Solver with Graphics Processing Units

GPU-accelerated computing of the Boltzmann collision integral is studied using deterministic method with piecewise approximation of the velocity distribution function and analytical integration over collision impact parameters. The acceleration of 40 times is achieved compared to CPU calculations for a 3D problem of collisional relaxation of bi-Maxwellian velocity distribution.

CUDA

Nov, 5

Transformation of Scientific Algorithms to Parallel Computing Code: Single GPU and MPI multi GPU Backends with Subdomain Support

We propose an approach for high-performance scientific computing that separates the description of algorithms from the generation of code for parallel hardware architectures like Multi-Core CPUs, GPUs or FPGAs. This way, a scientist can focus on his domain of expertise by describing his algorithms generically without the need to have knowledge of specific hardware architectures, […]

CUDA

Nov, 5

Challenges for compiler support for exascale computing

The compiler is central to the translation of the software we want users to write to the machine code we want to run. The scale of the applications and the choices of programming languages by users greatly complicate the role for the compiler and its analysis. The languages we use frequently don’t support rich optimizations […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

GPU-accelerated Convex Multi-phase Image Segmentation

Real-time Sliding Phase Vocoder using a Commodity GPU

Parallelization of algorithms for solving the Boltzmann equation for GPU-based computations

The conjugate gradient solver accelerated by GPU for solving wave-propagation problems

Advanced MRI reconstruction toolbox with accelerating on GPU

Global memory access modelling for efficient implementation of the lattice Boltzmann method on graphics processing units

Performance and scalability of Fourier domain optical coherence tomography acceleration using graphics processing units

Running unstructured grid-based CFD solvers on modern graphics hardware

Fast QAP Solver with ACO and Taboo Search on GPU using Move-Cost Adjusted Thread Assignment

Acceleration of Deterministic Boltzmann Solver with Graphics Processing Units

Transformation of Scientific Algorithms to Parallel Computing Code: Single GPU and MPI multi GPU Backends with Subdomain Support

Challenges for compiler support for exascale computing

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)