high performance computing on graphics processing units: hgpu.org

Posts

May, 20

Spatial Data Structures, Sorting and GPU Parallelism for Situated-agent Simulation and Visualisation

Spatial data partitioning techniques are important for obtaining fast and efficient simulations of N-Body particle and spatial agent based models where they considerably reduce redundant entity interaction computation times. Highly parallel techniques based on concurrent threading can be deployed to further speed up such simulations. We study the use of GPU accelerators and highly data […]

CUDA

May, 20

CUDA Based Enhanced Differential Evolution: a Computational Analysis

General purpose graphic programming unit (GPGPU) programming is a novel approach for solving parallel variable independent problems. The graphic processor core (GPU) gives the possibility to use multiple blocks, each of which contains hundreds of threads. Each of these threads can be visualized as a core onto itself, and tasks can be simultaneously sent to […]

CUDA

May, 19

Combustion Simulations Using Graphic Processing Units

Graphic processing units (GPUs) are powerful graphics engines featuring high levels of parallelism and extreme memory bandwidth, which constitute a powerful computing platform to solve complex problems involving chemically reacting flows. In the present study, computer programs for combustion simulations with detailed chemical kinetic mechanisms were compiled in the Compute Unified Device Architecture (CUDA) language […]

CUDA

May, 19

A GPU Algorithm for 3D Convex Hull

A novel algorithm is presented to compute the convex hull of a point set in R3using the graphics processing unit (GPU). By exploiting the relationship between the Voronoi diagram and the convex hull, the algorithm derives the approximation of the convex hull from the former. The missed points are found back by using a two-round […]

CUDA

May, 19

Real-Time Systems with Radiation-Hardened Processors: A GPU-based Framework to Explore Tradeoffs

Radiation-hardened processors are designed to be resilient against soft errorsbut such processors are slower than Commercial Off-The-Shelf (COTS)processors as well significantly costlier. In order to mitigate the high costs,software techniques such as task re-executions must be deployed together withadequately hardened processors to provide reliability. This leads to a huge designspace comprising of the hardening level […]

OpenCL

May, 19

Automatic Implementation of Evolutionary Algorithms on GPUs using ESDL

Modern computer processing units tend towards simpler cores in greater numbers, favouring the development of data-parallel applications. Evolutionary algorithms are ideal for taking full advantage of SIMD (Single Instruction, Multiple Data) processing, which is available on both CPUs and GPUs. Creating software that runs on a GPU requires the use of specialised programming languages or […]

May, 19

SPOC: GPGPU Programming Through Stream Processing With OCaml

General purpose computing on graphics processing units (GPGPU) consists of using GPUs to handle computations commonly handled by CPUs. GPGPU programming implies developing specific programs to run on GPUs managed by a host program running on the CPU. To achieve high performance implies to explicitly organize memory transfers between devices. Besides, different incompatible frameworks exist […]

CUDA

•

OpenCL

May, 19

C-DAC’s Efforts – Application Kernels on HPC Cluster with GPU Accelerators

We describe the problem of parallelization of finite difference method (FDM) and finite element method (FEM) computations for certain class of partial differential equations (PDEs) on High Performance Computing (HPC) GPU cluster. For FDM, the structured grids have been employed and optimal data rearrangement operations are performed in GPU computations. For FEM, unstructured triangular and […]

CUDA

•

OpenCL

May, 19

An MPI-CUDA Implementation and Optimization for Parallel Sparse Equations and Least Squares (LSQR)

LSQR (Sparse Equations and Least Squares) is a widely used Krylov subspace method to solve large-scale linear systems in seismic tomography. This paper presents a parallel MPI-CUDA implementation for LSQR solver. On CUDA level, our contributions include: (1) utilize CUBLAS and CUSPARSE to compute major steps in LSQR; (2) optimize memory copy between host memory […]

CUDA

May, 19

High Performance Monte Carlo and Time-Stepping Dynamics for the Classical Spin Heisenberg Model on GPUs

The Heisenberg model of classical spins makes use of both Monte Carlo stochastic dynamics as well as time-integration of its equation of motion. These two schemes have different parallelisation strategies and tradeoffs. We implement both algorithms using a data-parallel approach for Graphical Processing Units (GPUs) and we discuss the resulting performance on various combinations of […]

CUDA

May, 19

Accelerated GPU Simulation of Compressible Flow by the Discontinuous Evolution Galerkin Method

The aim of the present paper is to report on our recent results for GPU accelerated simulations of compressible flows. For numerical simulation the adaptive discontinuous Galerkin method with the multidimensional bicharacteristic based evolution Galerkin operator has been used. For time discretization we have applied the explicit third order Runge-Kutta method. Evaluation of the genuinely […]

CUDA

May, 19

Programming and Scheduling Model for Supporting Heterogeneous Accelerators in Linux

Computer systems increasingly integrate heterogeneous computing elements like graphic processing units and specialized co-processors. The systematic programming and exploitation of such heterogeneous systems is still a subject of research. While many efforts address the programming of accelerators, scheduling heterogeneous systems, i. e., mapping parts of an application to accelerators at runtime, is still performed from […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Spatial Data Structures, Sorting and GPU Parallelism for Situated-agent Simulation and Visualisation

CUDA Based Enhanced Differential Evolution: a Computational Analysis

Combustion Simulations Using Graphic Processing Units

A GPU Algorithm for 3D Convex Hull

Real-Time Systems with Radiation-Hardened Processors: A GPU-based Framework to Explore Tradeoffs

Automatic Implementation of Evolutionary Algorithms on GPUs using ESDL

SPOC: GPGPU Programming Through Stream Processing With OCaml

C-DAC’s Efforts – Application Kernels on HPC Cluster with GPU Accelerators

An MPI-CUDA Implementation and Optimization for Parallel Sparse Equations and Least Squares (LSQR)

High Performance Monte Carlo and Time-Stepping Dynamics for the Classical Spin Heisenberg Model on GPUs

Accelerated GPU Simulation of Compressible Flow by the Discontinuous Evolution Galerkin Method

Programming and Scheduling Model for Supporting Heterogeneous Accelerators in Linux

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)