high performance computing on graphics processing units: hgpu.org

Posts

Dec, 22

GPU-based parallel collision detection for real-time motion planning

We present parallel algorithms to accelerate collision queries for sample-based motion planning. Our approach is designed for current many-core GPUs and exploits the data-parallelism and multi-threaded capabilities. In order to take advantage of high number of cores, we present a clustering scheme and collision-packet traversal to perform efficient collision queries on multiple configurations simultaneously. Furthermore, […]

CUDA

Dec, 22

Efficient shallow water simulations on GPUs: Implementation, visualization, verification, and validation

In this paper, we present an efficient implementation of a state-of-the-art high-resolution explicit scheme for the shallow water equations on graphics processing units. The selected scheme is well-balanced, supports dry states, and is particularly suitable for implementation on graphics processing units. We verify and validate our implementation, and show that use of efficient single precision […]

CUDA

Dec, 22

Generative programming methods for parallel partial differential field equation solvers

This thesis describes a generative programming system that automatically constructs parallel simulations of complex systems that are based on field equations using finite differencing and explicit Runge-Kutta integration methods. Programming computational simulations by hand for different parallel architectures is both tedious and time consuming. Simulation frameworks struggle to target different architectures without losing performance. Automating […]

CUDA

Dec, 22

A GPU framework for parallel segmentation of volumetric images using discrete deformable model

Despite the ability of current GPU processors to treat heavy parallel computation tasks, its use for solving medical image segmentation problems is still not fully exploited and remains challenging. A lot of difficulties may arise related to, for example, the different image modalities, noise and artifacts of source images, or the shape and appearance variability […]

CUDA

Dec, 22

Extending adaptive sparse grids for stochastic collocation to hybrid parallel architectures

We are developing an adaptive sparse grid library tailored for emerging architectures that will allow the solution of stochastic problems of unprecedented size. This paper gives a brief overview of the problem at hand and presents initial results for a small GPU-based cluster. An outlook on large-scale distributed memory parallelization and our hybrid design approach […]

CUDA

Dec, 22

Solving Quadratic Programming Problems on Graphics Processing Unit

Quadratic Programming (QP) problems frequently appear as core component when solving constrained optimal control or estimation problems. The focus of this paper is on accelerating an existing Interior Point Method (IPM) for solving QP problems by exploiting the parallel computing characteristics of GPU. We compare the so-called data-parallel and the problem-parallel approaches to achieve speed […]

CUDA

Dec, 21

Porting and optimizing MAGFLOW on CUDA

The MAGFLOW lava simulation model is a cellular automaton developed by the Sezione di Catania of the Istituto Nazionale di Geofisica e Vulcanologia (INGV) and it represents the peak of the evolution of cell-based models for lava-flow simulation. The accuracy and adherence to reality achieved by the physics-based cell evolution of MAGFLOW comes at the […]

CUDA

Dec, 21

Computational Fluid Dynamics Simulations using Many Graphics Processors

Unsteady computational fluid dynamics simulations of turbulence are performed using up to 64 graphics processors. The results from two GPU clusters and a CPU cluster are compared. A second-order staggered-mesh spatial discretization is coupled with a low storage three-step Runge-Kutta time advancement and pressure projection at each substep. The pressure Poisson equation dominates the solution […]

CUDA

Dec, 21

Algorithm and implementation of multi-channel spike sorting using GPU in a home-care surveillance system

Intensive home-care surveillance programs are associated with a marked decrease in the need for hospitalization. They can improve the functional statuses of elderly patients with severe congestive diseases. The GPU-based home-care surveillance system is effective and has a major impact on health expenditure than traditional surveillance equipments. In this work, we propose a spike sorting […]

CUDA

Dec, 21

Efficient Probabilistic and Geometric Anatomical Mapping Using Particle Mesh Approximation on GPUs

Deformable image registration in the presence of considerable contrast differences and large size and shape changes presents significant research challenges. First, it requires a robust registration framework that does not depend on intensity measurements and can handle large nonlinear shape variations. Second, it involves the expensive computation of nonlinear deformations with high degrees of freedom. […]

CUDA

Dec, 21

A simulation suite for lattice Boltzmann based real time CFD applications exploiting multi-level parallelism on modern multi-and many-core architectures

We present a software approach to hardware-oriented numerics which builds upon an augmented, previously published set of open-source libraries facilitating portable code development and optimisation on a wide range of modern computer architectures. In order to maximise efficiency, we exploit all levels of parallelism, including vectorisation within CPU cores, the Cell BE and GPUs, shared […]

CUDA

Dec, 21

Warp-Level Parallelism: Enabling Multiple Replications In Parallel on GPU

Stochastic simulations need multiple replications in order to build confidence intervals for their results. Even if we do not need a large amount of replications, it is a good practice to speed-up the whole simulation time using the Multiple Replications In Parallel (MRIP) approach. This approach usually supposes to have access to a parallel computer […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

GPU-based parallel collision detection for real-time motion planning

Efficient shallow water simulations on GPUs: Implementation, visualization, verification, and validation

Generative programming methods for parallel partial differential field equation solvers

A GPU framework for parallel segmentation of volumetric images using discrete deformable model

Extending adaptive sparse grids for stochastic collocation to hybrid parallel architectures

Solving Quadratic Programming Problems on Graphics Processing Unit

Porting and optimizing MAGFLOW on CUDA

Computational Fluid Dynamics Simulations using Many Graphics Processors

Algorithm and implementation of multi-channel spike sorting using GPU in a home-care surveillance system

Efficient Probabilistic and Geometric Anatomical Mapping Using Particle Mesh Approximation on GPUs

A simulation suite for lattice Boltzmann based real time CFD applications exploiting multi-level parallelism on modern multi-and many-core architectures

Warp-Level Parallelism: Enabling Multiple Replications In Parallel on GPU

Recent source codes

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

Most viewed papers (last 30 days)