high performance computing on graphics processing units: hgpu.org

hgpu.org » Thesis

Source-to-Source Transformations for GPU Code Generation

Julien de Castelnau, Thomas Koehler, Arthur Charguéraud, Clément Pit-Claudel

View

Download (PDF)

Tags: Code generation, Computer science, CUDA, nVidia, nVidia GeForce RTX 5060, Thesis

May 20, 2026 by hgpu

Decoupled Triton: A Block-Level Decoupled Language for Writing and Exploring Efficient Machine-Learning Kernels

Quinn Leo Pham

View

Download (PDF)

Tags: Compilers, Computer science, Machine learning, nVidia, nVidia RTX 5000 Ada, PyTorch, Thesis, Triton

December 7, 2025 by hgpu

High-Performance Computing: from Optimization to Automation

Bérenger Bramas

View

Download (PDF)

Tags: AMD Radeon Instinct MI210, ATI, Computer science, CUDA, HIP, HPC, nVidia, nVidia A100, nVidia T600, Performance, Thesis

October 12, 2025 by hgpu

Towards Calculating HPC CUDA Kernel Performance on Nvidia GPUs

Dumeni Manatschal

View

Download (PDF)

Tags: Benchmarking, Computer science, CUDA, nVidia, nVidia GeForce RTX 3080, Performance, PTX, Thesis

September 14, 2025 by hgpu

Accelerating a Linear Programming Algorithm on AMD GPUs

Xiyan Hu, Titus Parker, Connor Phillips, Yifa Yu

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI210, AMD Radeon Instinct MI325X, ATI, Computer science, HIP, nVidia, nVidia A100, Package, Performance, PyTorch, ROCm, Thesis

August 31, 2025 by hgpu

GBOTuner: Autotuning of OpenMP Parallel Codes with Bayesian Optimization and Code Representation Transfer Learning

Kimsong Lor

View

Download (PDF)

Tags: Bayesian, Computer science, Fortran, Neural networks, OpenMP, Thesis

August 3, 2025 by hgpu

Using Deep Reinforcement Learning for Automatic Code Optimization in the MLIR Compiler

M. Ameur Nassim, M. Tirichine Mohammed

View

Download (PDF)

Tags: Computer science, High Energy Physics - Lattice, Neural networks, Physics, QCD, Thesis

July 20, 2025 by hgpu

Efficient GPU Implementation of Multi-Precision Integer Division

Aske N. Raahauge, Martin B. Marchioro, Marc I. Løvenskjold

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Extended precision, Futhark, nVidia, nVidia A100, Package, Thesis

July 6, 2025 by hgpu

Enabling Profile Guided Optimizations (PGO) for Graphics

Emma Jansson

View

Download (PDF)

Tags: ARM, Compilers, Computer science, LLVM, Thesis, Vulkan

June 15, 2025 by hgpu

Acceleration as a Service (XaaS) Source Containers

Eiman Alnuaimi

View

Download (PDF)

Source codes

Tags: Computer science, Heterogeneous systems, HPC, Intel, Intel Data Center GPU Max 1550, LLM, MPI, nVidia, nVidia GH200, nVidia V100, Optimization, Package, performance portability, Thesis

June 8, 2025 by hgpu

Low-cost edge computing using upcycled smartphones

Corentin Libert

View

Download (PDF)

Source codes

Tags: Computer science, Package, Performance, TensorFlow, Thesis

May 25, 2025 by hgpu

Efficient deep learning inference on end devices

Ehsan Aghapour

View

Download (PDF)

Source codes

Tags: Artificial intelligence, Computer science, Deep learning, Heterogeneous systems, OpenCL, Package, Thesis

May 4, 2025 by hgpu

CUDAnalyst (CUDA + Analyst)

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation

CodegenBench

CodegenBench: Can LLMs Write Efficient Code Across Architectures?

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

Agentic Code Optimization via Compiler-LLM Cooperation

Device Virtual Machine (DVM)

DVM: Real-Time Kernel Generation for Dynamic AI Models

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Source-to-Source Transformations for GPU Code Generation

Decoupled Triton: A Block-Level Decoupled Language for Writing and Exploring Efficient Machine-Learning Kernels

High-Performance Computing: from Optimization to Automation

Towards Calculating HPC CUDA Kernel Performance on Nvidia GPUs

Accelerating a Linear Programming Algorithm on AMD GPUs

GBOTuner: Autotuning of OpenMP Parallel Codes with Bayesian Optimization and Code Representation Transfer Learning

Using Deep Reinforcement Learning for Automatic Code Optimization in the MLIR Compiler

Efficient GPU Implementation of Multi-Precision Integer Division

Enabling Profile Guided Optimizations (PGO) for Graphics

Acceleration as a Service (XaaS) Source Containers

Low-cost edge computing using upcycled smartphones

Efficient deep learning inference on end devices

Recent source codes

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Agentic Code Optimization via Compiler-LLM Cooperation

Device Virtual Machine (DVM)

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Most viewed papers (last 30 days)