high performance computing on graphics processing units: hgpu.org

Applications

hgpu.org » paper

LeetDecoding: A PyTorch Library for Exponentially Decaying Causal Linear Attention with CUDA Implementations

Jiaping Wang, Simiao Zhang, Qiao-Chu He, Yifan Chen

View

Download (PDF)

Source codes

Tags: Benchmarking, Computer science, CUDA, LLM, Machine learning, nVidia, nVidia A100, nVidia RTX A6000, Package, Python, PyTorch

January 13, 2025 by hgpu

Finding Missed Code Size Optimizations in Compilers using LLMs

Davide Italiano, Chris Cummins

View

Download (PDF)

Tags: Computer science, LLM, Machine learning, Optimization, Programming Languages, Software Engineering

January 6, 2025 by hgpu

Enhancing Deployment-Time Predictive Model Robustness for Code Analysis and Optimization

Huanting Wang, Patrick Lenihan, Zheng Wang

View

Download (PDF)

Source codes

Tags: Artificial intelligence, Computer science, Machine learning, OpenCL, Optimization, Package, Software Engineering

January 6, 2025 by hgpu

Debunking the CUDA Myth Towards GPU-based AI Systems

Yunjae Lee, Juntaek Lim, Jehyeon Bang, Eunyeong Cho, Huijong Jeong, Taesu Kim, Hyungjun Kim, Joonhyung Lee, Jinseop Im, Ranggi Hwang, Se Jung Kwon, Dongsoo Lee, Minsoo Rhu

View

Download (PDF)

Tags: AI, Benchmarking, Computer science, CUDA, Intel, Intel Gaudi-2, nVidia, nVidia A100, Performance

January 6, 2025 by hgpu

Performant Automatic BLAS Offloading on Unified Memory Architecture with OpenMP First-Touch Style Data Movement

Junjie Li

View

Download (PDF)

Tags: Computer science, CUDA, HPC, Linear Algebra, nVidia, nVidia GH200, OpenMPI, Physics, Quantum Physics

January 6, 2025 by hgpu

A comparison of HPC-based quantum computing simulators using Quantum Volume

Lourens van Niekerk, Dhiraj Kumar, Aasish Kumar Sharma, Tino Meisel, Martin Leandro Paleico, Christian Boehme

View

Download (PDF)

Tags: Benchmarking, CUDA, nVidia, nVidia A100, OpenCL, Overview, Physics, Quantum computing, Review

January 6, 2025 by hgpu

Scalable Access-Pattern Aware I/O Acceleration and Multi-Tiered Data Management for HPC and AI Workloads

Avinash Maurya

View

Download (PDF)

Source codes

Tags: AI, Artificial intelligence, Computer science, CUDA, Deep learning, HPC, nVidia, nVidia DGX-A100, Package, Thesis

December 29, 2024 by hgpu

Asynchronous-Many-Task Systems: Challenges and Opportunities – Scaling an AMR Astrophysics Code on Exascale machines using Kokkos and HPX

Gregor Daiß, Patrick Diehl, Jiakun Yan, John K. Holmen, Rahulkumar Gayatri, Christoph Junghans, Alexander Straub, Jeff R. Hammond, Dominic Marcello, Miwako Tsuji, Dirk Pflüger, Hartmut Kaiser

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI100, AMD Radeon Instinct MI250X, Astrophysics, ATI, Computer science, CUDA, Heterogeneous systems, HIP, HPC, nVidia, nVidia A100, Package, performance portability, Physics