high performance computing on graphics processing units: hgpu.org

Posts

Jan, 26

Supercomputing with toys: harnessing the power of NVIDIA 8800GTX and playstation 3 for bioinformatics problem

Modern video cards and game consoles typically have much better performance to price ratios than that of general purpose CPUs. The parallel processing capabilities of game hardware are well-suited for high throughput biomedical data analysis. Our initial results suggest that game hardware is a cost-effective platform for some computationally demanding bioinformatics problems.

CUDA

Jan, 26

Efficient Bayesian inference in stochastic chemical kinetic models using graphical processing units

A goal of systems biology is to understand the dynamics of intracellular systems. Stochastic chemical kinetic models are often utilized to accurately capture the stochastic nature of these systems due to low numbers of molecules. Collecting system data allows for estimation of stochastic chemical kinetic rate parameters. We describe a well-known, but typically impractical data […]

CUDA

Jan, 25

Fast Schedulability Analysis Using Commodity Graphics Hardware

In this paper we explore the possibility of using commodity graphics processing units (GPUs) to speedup standard schedulability analysis algorithms. Our long-term goal is to exploit GPUs to accelerate common electronic design automation algorithms, most of which tend to be computationally expensive. Our main contribution in this paper is a reformulation of a standard demand […]

OpenGL

Jan, 25

Molecular dynamics simulation of the supercooled Al melt on GPUs

The method of molecular dynamics (MD) is widely used to study static and dynamic properties of the condensed matter [1]. In particular an approach to study the relaxation of metastable states is developed [2]. These states play essential role in the impulse loading processes such as shock compression, laser ablation, etc. Herewith we report on […]

Jan, 25

Simulation of stochastic processes using graphics hardware

Graphics Processing Units (GPUs) were originally designed to manipulate images, but due to their intrinsic parallel nature, they turned into a powerful tool for scientific applications. In this article, we evaluated GPU performance in an implementation of a traditional stochastic simulation – the correlated Brownian motion. This movement can be described by the Generalized Langevin […]

Jan, 25

Molecular dynamics simulations of the relaxation processes in the condensed matter on GPUs

We report on simulation technique and benchmarks for molecular dynamics simulations of the relaxation processes in solids and liquids using the graphics processing units (GPUs). The implementation of a many-body potential such as the embedded atom method (EAM) on GPU is discussed. The benchmarks obtained by LAMMPS and HOOMD packages for simple Lennard-Jones liquids and […]

Jan, 25

Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms

We present a scalable spatial decomposition coloring approach to implement molecular dynamics simulations with embedded atom method (EAM) on multi-core architectures. It effectively solves parallelization of reduction operations on irregular arrays in molecular dynamics simulations. In OpenMP program model, our methodology avoids that the same memory location is simultaneously modified by more than one thread […]

Jan, 25

A Graphics Processing Unit Implementation of Coulomb Interaction in Molecular Dynamics

We report a GPU implementation in HOOMD Blue of long-range electrostatic interactions based on the orientation-averaged Ewald sum scheme, introduced by Yakub and Ronchi (J. Chem. Phys. 2003, 119, 11556). The performance of the method is compared to an optimized CPU version of the traditional Ewald sum available in LAMMPS, in the molecular dynamics of […]

Jan, 25

Multi-Level Ewald: A Hybrid Multigrid/Fast Fourier Transform Approach to the Electrostatic Particle-Mesh Problem

We present a new method for decomposing the one convolution required by standard Particle-Particle Particle-Mesh (P3M) electrostatic methods into a series of convolutions over slab-shaped subregions of the original simulation cell. Most of the convolutions derive data from separate regions of the cell and can thus be computed independently via FFTs, in some cases with […]

Jan, 25

Parallel multiclass classification using SVMs on GPUs

The scaling of serial algorithms cannot rely on the improvement of CPUs anymore. The performance of classical Support Vector Machine (SVM) implementations has reached its limit and the arrival of the multi core era requires these algorithms to adapt to a new parallel scenario. Graphics Processing Units (GPU) have arisen as high performance platforms to […]

Jan, 25

Fast Calculation of Electrostatic Potentials on the GPU or the ASIC MD-GRAPE-3

Electrostatic potentials (ESPs) are frequently used in structural biology for the characterization of biomolecules. Here we study the potential employment of hardware accelerators like the graphics processing unit or the application-specific integrated circuit MD-GRAPE-3 for the purpose of efficient computation of ESPs. An algorithm closely coupled to the general description of molecular surfaces is ported […]

Jan, 25

A New Era in Scientific Computing: Domain Decomposition Methods in Hybrid CPU-GPU Architectures

Recent advances in graphics processing units (GPUs) technology open a new era in high performance computing. Applications of GPUs to scientific computations are attracting a lot of attention due to their low cost in conjunction with their inherently remarkable performance features and the recently enhanced computational precision and improved programming tools. Domain decomposition methods (DDM) […]

high performance computing on graphics processing units: hgpu.org

Posts

Supercomputing with toys: harnessing the power of NVIDIA 8800GTX and playstation 3 for bioinformatics problem

Efficient Bayesian inference in stochastic chemical kinetic models using graphical processing units

Fast Schedulability Analysis Using Commodity Graphics Hardware

Molecular dynamics simulation of the supercooled Al melt on GPUs

Simulation of stochastic processes using graphics hardware

Molecular dynamics simulations of the relaxation processes in the condensed matter on GPUs

Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms

A Graphics Processing Unit Implementation of Coulomb Interaction in Molecular Dynamics

Multi-Level Ewald: A Hybrid Multigrid/Fast Fourier Transform Approach to the Electrostatic Particle-Mesh Problem

Parallel multiclass classification using SVMs on GPUs

Fast Calculation of Electrostatic Potentials on the GPU or the ASIC MD-GRAPE-3

A New Era in Scientific Computing: Domain Decomposition Methods in Hybrid CPU-GPU Architectures

Recent source codes

DITRON: Distributed Compiler based on Triton for Parallel Systems

IntelliKit: Agent-first tooling for AMD hardware

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

Agentic Code Optimization via Compiler-LLM Cooperation

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Most viewed papers (last 30 days)