high performance computing on graphics processing units: hgpu.org

Posts

Mar, 1

Using GPUs to Improve Multigrid Solver Performance on a Cluster

This article explores the coupling of coarse and fine-grained parallelism for Finite Element simulations based on efficient parallel multigrid solvers. The focus lies on both system performance and a minimally invasive integration of hardware acceleration into an existing software package, requiring no changes to application code. Because of their excellent price performance ratio, we demonstrate […]

OpenGL

Mar, 1

Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU

We have previously presented an approach to include graphics processing units as co-processors in a parallel Finite Element multigrid solver called FEAST. In this paper we show that the acceleration transfers to real applications built on top of FEAST, without any modifications of the application code. The chosen solid mechanics code is well suited to […]

CUDA

Mar, 1

Performance of inverse atomistic scale fracture modeling on GPGPU architectures

The present work has been motivated by the continuous growth of General Purpose Graphic Processor Unit (GPGPU) technologies as well as the necessity of linking usability with multiscale materials processing and design. The inverse problem of determining the phenomenological interparticle Lenard-Jones potential governing the fracture dynamics of a two dimensional structure under tension, is used […]

Mar, 1

FEAST – Realisation of hardware-oriented Numerics for HPC simulations with Finite Elements

FEAST (Finite Element Analysis & Solutions Tools) is a Finite Element based solver toolkit for the simulation of PDE problems on parallel HPC systems which implements the concept of “hardware-oriented numerics”, a holistic approach aiming at optimal performance for modern numerics. In this paper, we describe this concept and the modular design which enables applications […]

CUDA

Mar, 1

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs

Fast, robust and efficient multigrid solvers are a key numerical tool in the solution of partial differential equations discretised with finite elements. The vast majority of practical simulation scenarios requires that the underlying grid is unstructured, and that high-order discretisations are used. On the other hand, hardware is quickly evolving towards parallelism and heterogeneity, even […]

CUDA

Mar, 1

Performance and accuracy of Lattice-Boltzmann kernels on multi- and manycore architectures

We present different kernels based on Lattice-Boltzmann methods for the solution of the two-dimensional Shallow Water and Navier-Stokes equations on fully structured lattices. The functionality ranges from simple scenarios like open-channel flows with planar beds to simulations with complex scene geometries like solid obstacles and non-planar bed topography with dry-states and even interaction of the […]

CUDA

Mar, 1

Lattice-Boltzmann Simulation of the Shallow-Water Equations with Fluid-Structure Interaction on Multi- and Manycore Processors

We present an efficient method for the simulation of laminar fluid flows with free surfaces including their interaction with moving rigid bodies, based on the two-dimensional shallow water equations and the Lattice-Boltzmann method. Our implementation targets multiple fundamentally different architectures such as commodity multicore CPUs with SSE, GPUs, the Cell BE and clusters. We show […]

CUDA

Mar, 1

A Simulation Suite for Lattice-Boltzmann based Real-Time CFD Applications Exploiting Multi-Level Parallelism on modern Multi- and Many-Core Architectures

We present a software approach to hardware-oriented numerics which builds upon an augmented, previously published open-source set of libraries facilitating portable code development and optimisation on a wide range of modern computer architectures. In order to maximise efficiency, we exploit all levels of parallelism, including vectorisation within CPU cores, the Cell BE and GPUs, shared […]

CUDA

Feb, 28

Accelerating Molecular Dynamics Simulations with GPUs

Molecular dynamics simulations are known to run for many days or weeks before completion. In this paper we explore the use of GPUs to accelerate a Lennard-Jones-based molecular dynamics simulation of up to 27000 atoms. We demonstrate speedups that exceed 100x on commodity Nvidia GPUs and discuss the strategies that allow for such exceptional speedups. […]

CUDA

Feb, 28

Fast and Accurate Generalized Harmonic Analysis and Its Parallel Computation by GPU

A fast and accurate method for Generalized Harmonic Analysis is proposed. The proposed method estimates the parameters of a sinusoid and subtracts it from a target signal one by one. The frequency of the sinusoid is estimated around a peak of Fourier spectrum using binary search. The binary search can control the trade-off between the […]

CUDA

Feb, 28

A Case Study for Petascale Applications in Astrophysics: Simulating Gamma-Ray Bursts

Petascale computing will allow astrophysicists to investigate astrophysical objects, systems, and events that cannot be studied by current observational means and that were previously excluded from computational study by sheer lack of CPU power and appropriate codes. Here we present a pragmatic case study, focussing on the simulation of gamma-ray bursts as a science driver […]

CUDA

Feb, 28

A general relativistic evolution code on CUDA architectures

I describe the implementation of a finite-differencing code for solving Einstein’s field equations on a GPU, and measure speed-ups compared to a serial code on a CPU for different parallelization and caching schemes. Using the most efficient scheme, the (single precision) GPU code on an NVIDIA Quadro FX 5600 is shown to be up to […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Using GPUs to Improve Multigrid Solver Performance on a Cluster

Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU

Performance of inverse atomistic scale fracture modeling on GPGPU architectures

FEAST – Realisation of hardware-oriented Numerics for HPC simulations with Finite Elements

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs

Performance and accuracy of Lattice-Boltzmann kernels on multi- and manycore architectures

Lattice-Boltzmann Simulation of the Shallow-Water Equations with Fluid-Structure Interaction on Multi- and Manycore Processors

A Simulation Suite for Lattice-Boltzmann based Real-Time CFD Applications Exploiting Multi-Level Parallelism on modern Multi- and Many-Core Architectures

Accelerating Molecular Dynamics Simulations with GPUs

Fast and Accurate Generalized Harmonic Analysis and Its Parallel Computation by GPU

A Case Study for Petascale Applications in Astrophysics: Simulating Gamma-Ray Bursts

A general relativistic evolution code on CUDA architectures

Recent source codes

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

Most viewed papers (last 30 days)