high performance computing on graphics processing units: hgpu.org

Posts

Oct, 4

Towards accelerating Smoothed Particle Hydrodynamics simulations for free-surface flows on multi-GPU clusters

Starting from the single graphics processing unit (GPU) version of the Smoothed Particle Hydrodynamics (SPH) code DualSPHysics, a multi-GPU SPH program is developed for free-surface flows. The approach is based on a spatial decomposition technique, whereby different portions (sub-domains) of the physical system under study are assigned to different GPUs. Communication between devices is achieved […]

CUDA

Oct, 3

KERNELGEN – A Toolchain for Automatic GPU-centric Applications Porting

KernelGen is a toolchain for porting existing source code on the GPU, that does not involve inserting annotations or manual kernels programming, but instead moves as much target source on the GPU, as possible, enabling automatic adaptation of large codebase, e.g. numerical models. Separate kernels are generated for parallel loops, and the rest of the […]

CUDA

•

OpenCL

Oct, 3

GPU-based infrared thermography for NDE of minefields

Infrared thermography is an attractive technique for non-destructive evaluation processes and particularly for detecting shallowly buried mines. Its use consists of subjecting the area under inspection to a source of natural or artificial heating/cooling process and studying the soil’s response by means of the analysis of its thermal evolution given by a temporal sequence of […]

CUDA

Oct, 3

Performance Analysis of an Ultrasound Reconstruction Algorithm for Non Destructive Testing

The CIVA software platform developed by CEA-LIST offers various simulation and data processing modules dedicated to non-destructive testing (NDT). In particular, ultrasonic imaging and reconstruction tools are proposed in the purpose of localizing echoes and identifying and sizing the detected defects. Because of the complexity of data processed, computation time is now a limitation for […]

CUDA

•

OpenCL

Oct, 3

Shape Modeling and GPU Based Image Warping

This project addresses the problems of manually placing facial landmarks on a portrait and finding a fast way to warp the annotated image of a face. While there are many approaches to automatically find facial landmarks, most of them provide insufficient results in uncontrolled environments. Thus I introduce a method to manually adjust a non-rigid […]

OpenGL

Oct, 3

Orthogononalization on a general purpose graphics processing unit with double double and quad double arithmetic

Our problem is to accurately solve linear systems of modest dimensions (typically, the number of variables equals 32) on a general purpose graphics processing unit. The linear systems originate from the application of Newton’s method on polynomial systems of (moderately) large degrees. Newton’s method is applied as a corrector in a path following method, so […]

CUDA

Oct, 2

CADDIES: A New Framework for Rapid Development of Parallel Cellular Automata Algorithms for Flood Simulation

A recent trend in the development of flood simulation algorithms shows the move toward fast simplified models instead of slow full hydrodynamic models. CADDIES is a research project that aims to develop a real/near-real time pluvial urban flood simulation model using the computational speed of cellular automata (CA) algorithms. This paper presents a component of […]

OpenCL

Oct, 2

Material Removal Simulation and Cutting Force Prediction of Multi-Axis Machining Processes on General-Purpose Graphics Processing Units

The efficient planning of automated machining processes is unthinkable without the use of offline CAM systems. Though machining programs can be written and input manually, right at the machine controller, if the workpiece geometry is complex, or if the machined features are numerous, the help of CAM software is essential for generating the program both […]

CUDA

Oct, 2

Exploiting Limited Access Distance of ODE Systems for Parallelism and Locality in Explicit Methods

The solution of initial value problems of large systems of ordinary differential equations (ODEs) is computationally intensive and demands for efficient parallel solution techniques that take into account the complex architectures of modern parallel computer systems. This article discusses implementation techniques suitable for ODE systems with a special coupling structure, called limited access distance, which […]

OpenCL

Oct, 2

Parallelizing LINQ Program for GPGPU

Recent technologies have brought parallel infrastructure to general users. Nowa-days parallel infrastructure is available in PC’s and personal laptops. Now single core machines have became history. Even multi-core technologies are replaced by GPGPUs when it comes to high performance computing because GPGPUs are giv-ing many cores at low cost. Sequential programs of the past are […]

CUDA

Oct, 2

Multi2Sim: a simulation framework for CPU-GPU computing

Accurate simulation is essential for the proper design and evaluation of any computing platform. Upon the current move toward the CPU-GPU heterogeneous computing era, researchers need a simulation framework that can model both kinds of computing devices and their interaction. In this paper, we present Multi2Sim, an open-source, modular, and fully configurable toolset that enables […]

OpenCL

Oct, 1

Parallel Application Library for Object Recognition

Computer vision research enables machines to understand the world. Humans usually interpret and analyze the world through what they see – the objects they capture with their eyes. Similarly, machines can better understand the world by recognizing objects in images. Object recognition is therefore a major branch of computer vision. To achieve the highest accuracy, […]

OpenCL

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Towards accelerating Smoothed Particle Hydrodynamics simulations for free-surface flows on multi-GPU clusters

KERNELGEN – A Toolchain for Automatic GPU-centric Applications Porting

GPU-based infrared thermography for NDE of minefields

Performance Analysis of an Ultrasound Reconstruction Algorithm for Non Destructive Testing

Shape Modeling and GPU Based Image Warping

Orthogononalization on a general purpose graphics processing unit with double double and quad double arithmetic

CADDIES: A New Framework for Rapid Development of Parallel Cellular Automata Algorithms for Flood Simulation

Material Removal Simulation and Cutting Force Prediction of Multi-Axis Machining Processes on General-Purpose Graphics Processing Units

Exploiting Limited Access Distance of ODE Systems for Parallelism and Locality in Explicit Methods

Parallelizing LINQ Program for GPGPU

Multi2Sim: a simulation framework for CPU-GPU computing

Parallel Application Library for Object Recognition

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)