high performance computing on graphics processing units: hgpu.org

Posts

Nov, 17

Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing

We present a Hessenberg reduction (HR) algorithm for hybrid systems of homogeneous multicore with GPU accelerators that can exceed 25 ? the performance of the corresponding LAPACK algorithm running on current homogeneous multicores. This enormous acceleration is due to proper matching of algorithmic requirements to architectural strengths of the system

CUDA

Nov, 17

Direct numerical simulation of sub-grid structures in gas-solid flow — GPU implementation of macro-scale pseudo-particle modeling

Due to significant multi-scale heterogeneity, understanding sub-grid structures is critical to effective continuum-based description of gas-solid flow. However, it is challenging for both physical measurements and numerical simulations. In this article, with the macro-scale pseudo-particle method (MaPPM) implemented on a GPU-based HPC system, up to 30,000 fluidized solids are simulated using the N-S equation directly. […]

Nov, 17

A multi-GPU accelerated solver for the three-dimensional two-phase incompressible Navier-Stokes equations

The use of graphics hardware for general purpose computations allows scientists to enormously speed up their numerical codes. We presently investigate the impact of this technology on our computational fluid dynamics solver for the three-dimensional two-phase incompressible Navier-Stokes equations, which is based on the level set technique and applies Chorin

CUDA

Nov, 16

Fast Bio-Inspired Computation using a GPU-based Systemic Computer

Biology is inherently parallel. Models of biological systems and bio-inspired algorithms also share this parallelism, although most are simulated on serial computers. Previous work created the systemic computer – a new model of computation designed to exploit many natural properties observed in biological systems, including parallelism. The approach has been proven through two existing implementations […]

Nov, 16

Domain Decomposition method on GPU cluster

Pallalel GPGPU computing for lattice QCD simulations has a bottleneck on the GPU to GPU data communication due to the lack of the direct data exchanging facility. In this work we investigate the performance of quark solver using the restricted additive Schwarz (RAS) preconditioner on a low cost GPU cluster. We expect that the RAS […]

CUDA

Nov, 16

Real-time nonlinear finite element computations on GPU – Application to neurosurgical simulation

Application of biomechanical modeling techniques in the area of medical image analysis and surgical simulation implies two conflicting requirements: accurate results and high solution speeds. Accurate results can be obtained only by using appropriate models and solution algorithms. In our previous papers we have presented algorithms and solution methods for performing accurate nonlinear finite element […]

CUDA

Nov, 16

Illustrative Volume Visualization Using GPU-Based Particle Systems

Illustrative techniques are generally applied to produce stylized renderings. Various illustrative styles have been applied to volumetric data sets, producing clearer images and effectively conveying visual information. We adopt particle systems to produce user-configurable stylized renderings from the volume data, imitating traditional pen-and-ink drawings. In the following, we present an interactive GPU-based illustrative volume rendering […]

OpenGL

Nov, 16

GPU-based smart visibility techniques for tumor surgery planning

PURPOSE: The rating of distances and infiltrations to vital structures is important for the planning of tumor surgery or interventional procedures. To support such an assessment, the target structures should be clearly emphasized in a 3D visualization by ensuring their visibility. METHODS: Smart Visibility techniques such as Ghosting Views and Breakaway Views are employed. Ghosting […]

OpenGL

Nov, 16

GPU rendering for tiled multi-projector autostereoscopic display based on chromium

In this paper, a GPU-based high-resolution multiview rendering approach (HRMVRA) is presented and incorporated into Chromium, and then a tiled multi-projector autostereoscopic display system (TMPADS) based on HRMVRA is constructed to provide an immersing 3D perception and a compelling sense of presence without the need of glasses for viewers. HRMVRA renders the multiview images in […]

OpenGL

Nov, 16

Multi-scale neural texture classification using the GPU as a stream processing engine

A neural architecture for texture classification running on the Graphics Processing Unit (GPU) under a stream processing model is presented in this paper. Textural features extraction is done in three different scales, it is based on the computations that take place on the mammalian primary visual pathway and incorporates both structural and color information. Feature […]

OpenGL

Nov, 16

Development of a GPU-based High-Performance Radiative Transfer Model for the Infrared Atmospheric Sounding Interferometer (IASI)

Satellite-observed radiance is a nonlinear functional of surface properties and atmospheric temperature and absorbing gas profiles as described by the radiative transfer equation (RTE). In the era of hyperspectral sounders with thousands of high-resolution channels, the computation of the radiative transfer model becomes more time-consuming. The radiative transfer model performance in operational numerical weather prediction […]

CUDA

Nov, 16

High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

Motivated by high computation power and low price per performance ratio of GPUs, GPU accelerated clusters are being built for high performance scientific computing. In this work, we propose a scalable implementation of a Conjugate Gradient (CG) solver for unstructured matrices on a GPU-extended cluster, where each cluster node has multiple GPUs. Basic computations of […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing

Direct numerical simulation of sub-grid structures in gas-solid flow — GPU implementation of macro-scale pseudo-particle modeling

A multi-GPU accelerated solver for the three-dimensional two-phase incompressible Navier-Stokes equations

Fast Bio-Inspired Computation using a GPU-based Systemic Computer

Domain Decomposition method on GPU cluster

Real-time nonlinear finite element computations on GPU – Application to neurosurgical simulation

Illustrative Volume Visualization Using GPU-Based Particle Systems

GPU-based smart visibility techniques for tumor surgery planning

GPU rendering for tiled multi-projector autostereoscopic display based on chromium

Multi-scale neural texture classification using the GPU as a stream processing engine

Development of a GPU-based High-Performance Radiative Transfer Model for the Infrared Atmospheric Sounding Interferometer (IASI)

High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)