high performance computing on graphics processing units: hgpu.org

Posts

Nov, 30

Acceleration of computational quantum chemistry by heterogeneous computer architectures

Computational quantum chemistry mehods such as the Hartree-Fock (HF), the density functional theory (DFT) or the fragment molecular orbital (FMO) require heavy computational resources. In this study they are accelerated by using graphics processing units (GPUs) and the vector instruction set (AVX) of latest CPU. PRISM algorithm to evaluate the electron repulsion integrals was vectorized […]

CUDA

Nov, 30

Optimization of the Particle-based Volume Rendering for GPUs with Hiding Data Transfer Latency

In this paper, we present the optimization of the particle-based volume rendering for GPU platforms. In general, data transfer between CPU and GPU accompanies long latency. Using page lock memory of the CUDA runtime API, data area is selected so that the data transfer between CPU and GPU becomes faster to reduce the execution time. […]

CUDA

Nov, 30

Performance and numerical accuracy evaluation of heterogeneous multicore systems for Krylov orthogonal basis computation

We study the numerical behavior of heterogeneous systems such as CPU with GPU or IBM Cell processors for some orthogonalization processes. We focus on the influence of the different floating arithmetic handling of these accelerators with Gram-Schmidt orthogonalization using single and double precision. We observe for dense matrices a loss of at worst 1 digit […]

CUDA

Nov, 30

GPGPU Accelerated Cardiac Arrhythmia Simulations

Computational modeling of cardiac electrophysiology is a powerful tool for studying arrhythmia mechanisms. In particular, cardiac models are useful for gaining insights into experimental studies, and in the foreseeable future they will be used by clinicians to improve therapy for the patients suffering from complex arrhythmias. Such models are highly intricate, both in their geometric […]

CUDA

Nov, 30

Unleashing the Power of Distributed CPU/GPU Architectures: Massive Astronomical Data Analysis and Visualization case study

Upcoming and future astronomy research facilities will systematically generate terabyte-sized data sets moving astronomy into the Petascale data era. While such facilities will provide astronomers with unprecedented levels of accuracy and coverage, the increases in dataset size and dimensionality will pose serious computational challenges for many current astronomy data analysis and visualization tools. With such […]

CUDA

Nov, 30

GPU-Accelerated SPH Model for Water Waves and Other Free Surface Flows

This paper discusses the meshless numerical method Smoothed Particle Hydrodynamics and its application to water waves and nearshore circulation. In particularly we focus on an implementation of the model on the graphics processing unit (GPU) of computers, which permits low-cost supercomputing capabilities for certain types of computational problems. The implementation here runs on Nvidia graphics […]

CUDA

•

OpenGL

Nov, 29

Parallel preconditioning for spherical harmonics expansions of the Boltzmann transport equation

While the Monte Carlo method for the Boltzmann transport equation for semiconductors has already been parallelized, this is much more difficult to accomplish for the deterministic spherical harmonics expansion method which requires the solution of a linear system of equations. For the typically employed iterative solvers, preconditioners are required to obtain good convergence rates. These […]

OpenCL

Nov, 29

Multiresolution Flow Simulations on Multi/many-core Architectures

One of the key challenges in Computational Science is closing the gap between the available computer power and its effective utilization for the simulation of complex physical systems and engineering applications. In order to achieve this goal we must minimize the time-to-solution and the related energy requirements of simulations by developing scalable software and methods […]

CUDA

Nov, 29

Applications Performance on GPGPUs with the Fermi Architecture

The latest GPU architecture released by Nvidia, code-named "Fermi", is the most advanced computing GPU architecture ever built. Radical changes took place on the GPU computing architecture compared to Fermi’s predecessors such as the GT200 series and the G80s. In this dissertation the Fermi architecture is analysed, addressing the most prominent upgrades, by running extensive […]

CUDA

Nov, 29

Dynamic Task Parallelism with a GPU Work-Stealing Runtime System

NVIDIA’s Compute Unified Device Architecture (CUDA) and its attached C/C++ based API went a long way towards making GPUs more accessible to mainstream programming. So far, the use of GPUs for high performance computing has been primarily restricted to data parallel applications, and with good reason. The high number of computational cores and high memory […]

CUDA

Nov, 29

Directives Based Programming of GPU Accelerated Systems

Graphics Processing Units (GPUs) are commodity chips primarily used as coprocessors for processing high definition graphics on a computer system. It possess faster processing power and efficiency in handling accurate single and double floating point numbers with less power consumption compared to CPUs. Realising its potential in general purpose computing manufacturers of these chips have […]

CUDA

Nov, 29

Enabling Efficient Online Profiling of Homogeneous and Heterogeneous Multicore Systems

Using profiling tools is a common way to understand computer systems and software and to achieve the best performance. Profiling becomes more important as computing technology advances and makes it more difficult to intuitively reason about system characteristics. However, the recent shift in computing technology to multicore systems and heterogeneous systems requires new profiling methods […]

CUDA

•

OpenCL

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Acceleration of computational quantum chemistry by heterogeneous computer architectures

Optimization of the Particle-based Volume Rendering for GPUs with Hiding Data Transfer Latency

Performance and numerical accuracy evaluation of heterogeneous multicore systems for Krylov orthogonal basis computation

GPGPU Accelerated Cardiac Arrhythmia Simulations

Unleashing the Power of Distributed CPU/GPU Architectures: Massive Astronomical Data Analysis and Visualization case study

GPU-Accelerated SPH Model for Water Waves and Other Free Surface Flows

Parallel preconditioning for spherical harmonics expansions of the Boltzmann transport equation

Multiresolution Flow Simulations on Multi/many-core Architectures

Applications Performance on GPGPUs with the Fermi Architecture

Dynamic Task Parallelism with a GPU Work-Stealing Runtime System

Directives Based Programming of GPU Accelerated Systems

Enabling Efficient Online Profiling of Homogeneous and Heterogeneous Multicore Systems

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)