high performance computing on graphics processing units: hgpu.org

Laksono Adhianto, Jonathon Anderson, Robert Matthew Barnett, Dragana Grbic, Vladimir Indic, Mark Krentel, Yumeng Liu, Srdan Milakovíc, Wileam Phan, John Mellor-Crumme

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI300A, ATI, Computer science, CUDA, HPC, MPI, OpenCL, Package, Performance

September 15, 2024 by hgpu

VitBit: Enhancing Embedded GPU Performance for AI Workloads through Register Operand Packing

Jaebeom Jeon, Minseong Gil, Junsu Kim, Jaeyong Park, Gunjae Koo, Myung Kuk Yoon, Yunho Oh

View

Download (PDF)

Tags: AI, Artificial intelligence, Computer science, CUDA, Deep learning, Neural networks, nVidia, nVidia Jetson AGX Orin, Performance

September 1, 2024 by hgpu

Exploring Scalability in C++ Parallel STL Implementations

Ruben Laso, Diego Krupitza, Sascha Hunold

View

Download (PDF)

Source codes

Tags: Benchmarking, Computer science, CUDA, nVidia, nVidia Ampere A2, OpenMP, Package, Performance, Tesla T4

September 1, 2024 by hgpu

Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects

Daniele De Sensi, Lorenzo Pichetti, Flavio Vella, Tiziano De Matteis, Zebin Ren, Luigi Fusco, Matteo Turisini, Daniele Cesarini, Kurt Lust, Animesh Trivedi, Duncan Roweth, Filippo Spiga, Salvatore Di Girolamo, Torsten Hoefler

View

Download (PDF)

Tags: AMD Radeon Instinct MI250X, ATI, Benchmarking, Computer science, CUDA, HPC, MPI, nVidia, nVidia A100, nVidia H100, Performance

September 1, 2024 by hgpu

Characterizing CUDA and OpenMP Synchronization Primitives

Brandon Alexander Burtchell, Martin Burtscher

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, nVidia, nVidia A100, nVidia GeForce RTX 2070, nVidia GeForce RTX 4090, OpenMP, Package, Performance

August 25, 2024 by hgpu

The VerCors Verifier: A Progress Report

Lukas Armborst, Pieter Bos, Lars B. van den Haak, Marieke Huisman, Robert Rubbens, Ömer Şakar, Philip Tasche

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, OpenCL, OpenMP, Package, Performance, SYCL

August 18, 2024 by hgpu

* * *

high performance computing on graphics processing units: hgpu.org

Effects of OpenCL-Based Parallelization Methods on Explicit Numerical Methods to Solve the Heat Equation

Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric

Event-Based OpenMP Tasks for Time-Sensitive GPU-Accelerated Systems

The Landscape of GPU-Centric Communication

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

Optimal Workload Placement on Multi-Instance GPUs

Refining HPCToolkit for application performance analysis at exascale

VitBit: Enhancing Embedded GPU Performance for AI Workloads through Register Operand Packing

Exploring Scalability in C++ Parallel STL Implementations

Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects

Characterizing CUDA and OpenMP Synchronization Primitives

The VerCors Verifier: A Progress Report

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)