high performance computing on graphics processing units: hgpu.org

hgpu.org » Physics

The Shamrock code: I- Smoothed Particle Hydrodynamics on GPUs

Timothée David--Cléris, Guillaume Laibe, Yona Lapeyre

View

Download (PDF)

Source codes

Tags: AMD, AMD Radeon Instinct MI250X, Astrophysics, CUDA, MPI, nVidia, nVidia A100, OpenMP, Package, Physics, PTX, ROCm, SYCL

March 23, 2025 by hgpu

Performant Automatic BLAS Offloading on Unified Memory Architecture with OpenMP First-Touch Style Data Movement

Junjie Li

View

Download (PDF)

Tags: Computer science, CUDA, HPC, Linear Algebra, nVidia, nVidia GH200, OpenMPI, Physics, Quantum Physics

January 6, 2025 by hgpu

A comparison of HPC-based quantum computing simulators using Quantum Volume

Lourens van Niekerk, Dhiraj Kumar, Aasish Kumar Sharma, Tino Meisel, Martin Leandro Paleico, Christian Boehme

View

Download (PDF)

Tags: Benchmarking, CUDA, nVidia, nVidia A100, OpenCL, Overview, Physics, Quantum computing, Review

January 6, 2025 by hgpu

Asynchronous-Many-Task Systems: Challenges and Opportunities – Scaling an AMR Astrophysics Code on Exascale machines using Kokkos and HPX

Gregor Daiß, Patrick Diehl, Jiakun Yan, John K. Holmen, Rahulkumar Gayatri, Christoph Junghans, Alexander Straub, Jeff R. Hammond, Dominic Marcello, Miwako Tsuji, Dirk Pflüger, Hartmut Kaiser

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI100, AMD Radeon Instinct MI250X, Astrophysics, ATI, Computer science, CUDA, Heterogeneous systems, HIP, HPC, nVidia, nVidia A100, Package, performance portability, Physics

December 29, 2024 by hgpu

TorchQC – A framework for efficiently integrating machine and deep learning methods in quantum dynamics and control

Dimitris Koutromanos, Dionisis Stefanatos, Emmanuel Paspalakis

View

Download (PDF)

Source codes

Tags: Deep learning, Machine learning, Package, Physics, Python, PyTorch, Quantum Physics

December 29, 2024 by hgpu

CLUEstering: a high-performance density-based clustering library for scientific computing

Simone Balducci

View

Download (PDF)

Source codes

Tags: Astrophysics, ATI, Clustering, CUDA, FPGA, HIP, Machine learning, nVidia, Package, performance portability, Physics, Python, Tesla T4, Thesis

December 1, 2024 by hgpu

Scaling SU(2) to 1000 GPUs using HiRep

Sofie Martins, Erik Kjellgren, Emiliano Molinaro, Claudio Pica, Antonio Rago

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI250X, ATI, CUDA, HEP, High Energy Physics - Lattice, HIP, Monte Carlo simulation, nVidia, nVidia H100, Package, Physics

December 1, 2024 by hgpu

Accelerating Drug Discovery in AutoDock-GPU with Tensor Cores

Gabin Schieffer, Ivy Peng

View

Download (PDF)

Tags: Chemistry, Computer science, CUDA, Macromolecule, molecular docking, nVidia, nVidia A100, nVidia V100, Physics, Tesla T4

October 20, 2024 by hgpu

Effects of OpenCL-Based Parallelization Methods on Explicit Numerical Methods to Solve the Heat Equation

Dániel Koics, Endre Kovács 2, Olivér Hornyák

View

Download (PDF)

Source codes

Tags: nVidia, OpenCL, Package, PDEs, Performance, Physics

October 13, 2024 by hgpu

RBMD: A molecular dynamics package enabling to simulate 10 million all-atom particles in a single graphics processing unit

Weihang Gao, Teng Zhao, Yongfa Guo, Jiuyang Liang, Huan Liu, Maoying Luo, Zedong Luo, Wei Qin, Yichao Wang, Qi Zhou, Shi Jin, Zhenli Xu

View

Download (PDF)

Source codes

Tags: Computational Physics, CUDA, Heterogeneous systems, Molecular dynamics, nVidia, nVidia GeForce RTX 4090, Package, Physics, Tesla A100, Tesla V100

July 28, 2024 by hgpu

GPU Parallelization of Astronomical Image Subtraction

Gustav Arneving, Hugo Wilhelmsson

View

Download (PDF)

Tags: Astrophysics, nVidia, nVidia GeForce GTX 1050, OpenCL, OpenMP, Physics, Thesis

June 23, 2024 by hgpu

Gaining Cross-Platform Parallelism for HAL’s Molecular Dynamics Package using SYCL

Viktor Skoblin, Felix Höfling, Steffen Christgau

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI210, ATI, Computer science, CUDA, Molecular dynamics, nVidia, nVidia A100, nVidia A40, Package, Physics, SYCL

June 9, 2024 by hgpu

CFAL-bench

Comparing Parallel Functional Array Languages: Programming and Performance

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Data-efficient LLM Fine-tuning for Code Generation

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

The Shamrock code: I- Smoothed Particle Hydrodynamics on GPUs

Performant Automatic BLAS Offloading on Unified Memory Architecture with OpenMP First-Touch Style Data Movement

A comparison of HPC-based quantum computing simulators using Quantum Volume

Asynchronous-Many-Task Systems: Challenges and Opportunities – Scaling an AMR Astrophysics Code on Exascale machines using Kokkos and HPX

TorchQC – A framework for efficiently integrating machine and deep learning methods in quantum dynamics and control

CLUEstering: a high-performance density-based clustering library for scientific computing

Scaling SU(2) to 1000 GPUs using HiRep

Accelerating Drug Discovery in AutoDock-GPU with Tensor Cores

Effects of OpenCL-Based Parallelization Methods on Explicit Numerical Methods to Solve the Heat Equation

RBMD: A molecular dynamics package enabling to simulate 10 million all-atom particles in a single graphics processing unit

GPU Parallelization of Astronomical Image Subtraction

Gaining Cross-Platform Parallelism for HAL’s Molecular Dynamics Package using SYCL

Recent source codes

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Data-efficient LLM Fine-tuning for Code Generation

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

Most viewed papers (last 30 days)