high performance computing on graphics processing units: hgpu.org

Posts

Dec, 22

Beyond Amdahl’s Law: An Objective Function That Links Multiprocessor Performance Gains To Delay and Energy

Beginning with Amdahl’s law, we derive a general objective function that links parallel processing performance gains at the system level, to energy and delay in the sub-system microarchitecture structures. The objective function employs parameterized models of computation and communication to represent the characteristics of processors, memories, and communications networks. The interaction of the latter microarchitectural […]

Dec, 22

A block-asynchronous relaxation method for graphics processing units

In this paper, we analyze the potential of asynchronous relaxation methods on Graphics Processing Units (GPUs). For this purpose, we developed a set of asynchronous iteration algorithms in CUDA and compared them with a parallel implementation of synchronous relaxation methods on CPU-based systems. For a set of test matrices taken from the University of Florida […]

CUDA

Dec, 22

AES Encryption and Decryption Using Direct3D 10 API

Current video cards (GPUs – Graphics Processing Units) are very programmable, have become much more powerful than the CPUs and they are very affordable. In this paper, we present an implementation for the AES algorithm using Direct3D 10 certified GPUs. The graphics API Direct3D 10 is the first version that allows the use of integer […]

CUDA

Dec, 22

GPU-based parallel collision detection for real-time motion planning

We present parallel algorithms to accelerate collision queries for sample-based motion planning. Our approach is designed for current many-core GPUs and exploits the data-parallelism and multi-threaded capabilities. In order to take advantage of high number of cores, we present a clustering scheme and collision-packet traversal to perform efficient collision queries on multiple configurations simultaneously. Furthermore, […]

CUDA

Dec, 22

Efficient shallow water simulations on GPUs: Implementation, visualization, verification, and validation

In this paper, we present an efficient implementation of a state-of-the-art high-resolution explicit scheme for the shallow water equations on graphics processing units. The selected scheme is well-balanced, supports dry states, and is particularly suitable for implementation on graphics processing units. We verify and validate our implementation, and show that use of efficient single precision […]

CUDA

Dec, 22

Generative programming methods for parallel partial differential field equation solvers

This thesis describes a generative programming system that automatically constructs parallel simulations of complex systems that are based on field equations using finite differencing and explicit Runge-Kutta integration methods. Programming computational simulations by hand for different parallel architectures is both tedious and time consuming. Simulation frameworks struggle to target different architectures without losing performance. Automating […]

CUDA

Dec, 22

A GPU framework for parallel segmentation of volumetric images using discrete deformable model

Despite the ability of current GPU processors to treat heavy parallel computation tasks, its use for solving medical image segmentation problems is still not fully exploited and remains challenging. A lot of difficulties may arise related to, for example, the different image modalities, noise and artifacts of source images, or the shape and appearance variability […]

CUDA

Dec, 22

Extending adaptive sparse grids for stochastic collocation to hybrid parallel architectures

We are developing an adaptive sparse grid library tailored for emerging architectures that will allow the solution of stochastic problems of unprecedented size. This paper gives a brief overview of the problem at hand and presents initial results for a small GPU-based cluster. An outlook on large-scale distributed memory parallelization and our hybrid design approach […]

CUDA

Dec, 22

Solving Quadratic Programming Problems on Graphics Processing Unit

Quadratic Programming (QP) problems frequently appear as core component when solving constrained optimal control or estimation problems. The focus of this paper is on accelerating an existing Interior Point Method (IPM) for solving QP problems by exploiting the parallel computing characteristics of GPU. We compare the so-called data-parallel and the problem-parallel approaches to achieve speed […]

CUDA

Dec, 21

Porting and optimizing MAGFLOW on CUDA

The MAGFLOW lava simulation model is a cellular automaton developed by the Sezione di Catania of the Istituto Nazionale di Geofisica e Vulcanologia (INGV) and it represents the peak of the evolution of cell-based models for lava-flow simulation. The accuracy and adherence to reality achieved by the physics-based cell evolution of MAGFLOW comes at the […]

CUDA

Dec, 21

Computational Fluid Dynamics Simulations using Many Graphics Processors

Unsteady computational fluid dynamics simulations of turbulence are performed using up to 64 graphics processors. The results from two GPU clusters and a CPU cluster are compared. A second-order staggered-mesh spatial discretization is coupled with a low storage three-step Runge-Kutta time advancement and pressure projection at each substep. The pressure Poisson equation dominates the solution […]

CUDA

Dec, 21

Algorithm and implementation of multi-channel spike sorting using GPU in a home-care surveillance system

Intensive home-care surveillance programs are associated with a marked decrease in the need for hospitalization. They can improve the functional statuses of elderly patients with severe congestive diseases. The GPU-based home-care surveillance system is effective and has a major impact on health expenditure than traditional surveillance equipments. In this work, we propose a spike sorting […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Beyond Amdahl’s Law: An Objective Function That Links Multiprocessor Performance Gains To Delay and Energy

A block-asynchronous relaxation method for graphics processing units

AES Encryption and Decryption Using Direct3D 10 API

GPU-based parallel collision detection for real-time motion planning

Efficient shallow water simulations on GPUs: Implementation, visualization, verification, and validation

Generative programming methods for parallel partial differential field equation solvers

A GPU framework for parallel segmentation of volumetric images using discrete deformable model

Extending adaptive sparse grids for stochastic collocation to hybrid parallel architectures

Solving Quadratic Programming Problems on Graphics Processing Unit

Porting and optimizing MAGFLOW on CUDA

Computational Fluid Dynamics Simulations using Many Graphics Processors

Algorithm and implementation of multi-channel spike sorting using GPU in a home-care surveillance system

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)