high performance computing on graphics processing units: hgpu.org

Packages

hgpu.org » Applications

Concurrent Scheduling of High-Level Parallel Programs on Multi-GPU Systems

Fabian Knorr, Philip Salzmann, Peter Thoman, Thomas Fahringer

View

Download (PDF)

Source codes

Tags: Benchmarking, Computer science, HPC, nVidia, nVidia A100, Package, performance portability, SYCL, Task scheduling

March 23, 2025 by hgpu

Hercules: A Compiler for Productive Programming of Heterogeneous Systems

Russel Arbore, Aaron Councilman, Xavier Routh, Ryan Ziegler, Praneet Rathi, Vikram Adve

View

Download (PDF)

Source codes

Tags: Code generation, Computer science, CUDA, Heterogeneous systems, LLVM, nVidia, nVidia GeForce RTX 2080 Ti, Package, Programming Languages, Rust

March 23, 2025 by hgpu

A Microbenchmark Framework for Performance Evaluation of OpenMP Target Offloading

Mohammad Atif, Tianle Wang, Zhihua Dong, Charles Leggett, Meifeng Lin

View

Download (PDF)

Source codes

Tags: Benchmarking, Computer science, Heterogeneous systems, nVidia A100, nVidia RTX A6000, nVidia V100, OpenMP, Package

March 10, 2025 by hgpu

SUperman: Efficient Permanent Computation on GPUs

Deniz Elbek, Fatih Taşyaran, Bora Uçar, Kamer Kaya

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, HPC, Numerical Analysis, nVidia, nVidia A100, nVidia Quadro GV100, OpenMPI, Package

March 10, 2025 by hgpu

TransCL: An Automatic CUDA-to-OpenCL Programs Transformation Framework

Changqing Shi, Yufei Sun, Rui Chen, Jiahao Wang, Qiang Guo, Chunye Gong, Yicheng Sui, Yutong Jin, Yuzhi Zhang

View

Download (PDF)

Source codes

Tags: Code generation, Computer science, CUDA, HPC, Memory model, nVidia, OpenCL, Package

March 3, 2025 by hgpu

pyATF: Constraint-Based Auto-Tuning in Python

Richard Schulze, Sergei Gorlatch, Ari Rasch

View

Download (PDF)

Source codes

Tags: Auto-Tuning, Compilers, Computer science, CUDA, nVidia A100, OpenCL, Package, Performance, Python

March 3, 2025 by hgpu

TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators

Jianling Li, Shangzhan Li, Zhenye Gao, Qi Shi, Yuxuan Li, Zefan Wang, Jiacheng Huang, Haojie Wang, Jianrong Wang, Xu Han, Zhiyuan Liu, Maosong Sun

View

Download (PDF)

Source codes

Tags: Benchmarking, Code generation, Computer science, CUDA, Deep learning, LLM, nVidia, nVidia A100, Package, Python

March 3, 2025 by hgpu

CRIUgpu: Transparent Checkpointing of GPU-Accelerated Workloads

Radostin Stoyanov, Viktória Spišaková, Jesus Ramos, Steven Gurfinkel, Andrei Vagin, Adrian Reber, Wesley Armour, Rodrigo Bruno

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI210, ATI, Computer science, CUDA, Deep learning, nVidia, nVidia A100, nVidia H100, nVidia RTX A6000, Package, ROCm

March 3, 2025 by hgpu

KernelBench: Can LLMs Write Efficient GPU Kernels?

Anne Ouyang, Simon Guo, Simran Arora, Alex L. Zhang, William Hu, Christopher Ré, Azalia Mirhoseini

View

Download (PDF)

Source codes

Tags: AI, Benchmarking, Computer science, CUDA, LLM, Machine learning, nVidia, nVidia L40s, Package, PyTorch

February 24, 2025 by hgpu

Seamless acceleration of Fortran intrinsics via AMD AI engines

Nick Brown, Gabriel Rodríguez Canal

View

Download (PDF)

Source codes

Tags: AI, AMD, Computer science, Fortran, Linear Algebra, Package, Performance

February 24, 2025 by hgpu

Forecasting time series with constraints

Nathan Doumèche, Francis Bach, Éloi Bedek, Gérard Biau, Claire Boyer, Yannig Goude

View

Download (PDF)

Source codes

Tags: AI, Benchmarking, Computer science, Linear Algebra, Machine learning, nVidia, nVidia L4, Package

February 24, 2025 by hgpu

The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition

Robert Tjarko Lange, Aaditya Prasad, Qi Sun, Maxence Faldor, Yujin Tang, David Ha

View

Download (PDF)

Source codes

Tags: AI, Computer science, CUDA, LLM, nVidia, nVidia H100, Package, Performance

February 24, 2025 by hgpu

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Packages

Concurrent Scheduling of High-Level Parallel Programs on Multi-GPU Systems

Hercules: A Compiler for Productive Programming of Heterogeneous Systems

A Microbenchmark Framework for Performance Evaluation of OpenMP Target Offloading

SUperman: Efficient Permanent Computation on GPUs

TransCL: An Automatic CUDA-to-OpenCL Programs Transformation Framework

pyATF: Constraint-Based Auto-Tuning in Python

TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators

CRIUgpu: Transparent Checkpointing of GPU-Accelerated Workloads

KernelBench: Can LLMs Write Efficient GPU Kernels?

Seamless acceleration of Fortran intrinsics via AMD AI engines

Forecasting time series with constraints

The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)