high performance computing on graphics processing units: hgpu.org

hgpu.org » Computer science

BootCMatchG: An adaptive Algebraic MultiGrid linear solver for GPUs

Massimo Bernaschi, Pasqua D'Ambra, Dario Pasquini

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, multigrid, nVidia, Package, Sparse linear iterative solver

November 29, 2020 by hgpu

AZP: Automatic Specialization for Zero Values in Gaming Applications

Mark W. Stephenson, Ram Rangan

View

Download (PDF)

Tags: Computer science, Games, HLSL, nVidia, nVidia GeForce RTX 2080, OpenGL, Performance, Vulkan

November 29, 2020 by hgpu

Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL

James Reinders, Ben Ashbaugh, James Brodman, Michael Kinsner, John Pennycook, Xinmin Tian

View

Download (PDF)

Tags: Book, Computer science, FPGA, Heterogeneous systems, SYCL

November 22, 2020 by hgpu

A Survey of System Architectures and Techniques for FPGA Virtualization

Masudul Hassan Quraishi, Erfan Bank Tavakoli, Fengbo Ren

View

Download (PDF)

Tags: Computer science, FPGA, Hardware Architecture, HLS, OpenCL, survey, Virtualization

November 22, 2020 by hgpu

A Novel Memory-Efficient Deep Learning Training Framework via Error-Bounded Lossy Compression

Sian Jin, Guanpeng Li, Shuaiwen Leon Song, Dingwen Tao

View

Download (PDF)

Source codes

Tags: Compression, Computer science, CUDA, Deep learning, Neural networks, nVidia, Package, Tesla V100

November 22, 2020 by hgpu

Ginkgo – A Math Library designed for Platform Portability

Terry Cojean, Yu-Hsiang "Mike" Tsai, Hartwig Anzt

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, HIP, nVidia, Package, Performance, performance portability

November 22, 2020 by hgpu

GPURepair: Automated Repair of GPU Kernels

Saurabh Joshi, Gautam Muduganti

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, OpenCL, Package, Software Engineering

November 22, 2020 by hgpu

Adaptive Data Migration in Load-Imbalanced HPC Applications

Parsa Amini

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Heterogeneous systems, HPC, load balancing, nVidia, Package, Tesla P100, Tesla V100, Thesis

November 15, 2020 by hgpu

Runtime Performances Benchmark for Knowledge Graph Embedding Methods

Angelica Sofia Valeriani

View

Download (PDF)

Source codes

Tags: Benchmarking, Computer science, CUDA, Machine learning, nVidia, nVidia GeForce GTX 960, Package

November 15, 2020 by hgpu

FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation

Songxiang Liu, Yuewen Cao, Na Hu, Dan Su, Helen Meng

View

Download (PDF)

Tags: Computer science, nVidia, Speech recognition, Tesla P40

November 15, 2020 by hgpu

Exploring the acceleration of Nekbone on reconfigurable architectures

Nick Brown

View

Download (PDF)

Tags: Computer science, FPGA, HLS, nVidia, OpenCL, Tesla V100

November 15, 2020 by hgpu

Automatic GPU optimization through higher-order functions in functional languages

John Wikman

View

Download (PDF)

Tags: Computer science, CUDA, nVidia, nVidia Quadro P 2000, Optimization, Thesis

November 15, 2020 by hgpu

CUDAnalyst (CUDA + Analyst)

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation

CodegenBench

CodegenBench: Can LLMs Write Efficient Code Across Architectures?

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

Agentic Code Optimization via Compiler-LLM Cooperation

Device Virtual Machine (DVM)

DVM: Real-Time Kernel Generation for Dynamic AI Models

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

BootCMatchG: An adaptive Algebraic MultiGrid linear solver for GPUs

AZP: Automatic Specialization for Zero Values in Gaming Applications

Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL

A Survey of System Architectures and Techniques for FPGA Virtualization

A Novel Memory-Efficient Deep Learning Training Framework via Error-Bounded Lossy Compression

Ginkgo – A Math Library designed for Platform Portability

GPURepair: Automated Repair of GPU Kernels

Adaptive Data Migration in Load-Imbalanced HPC Applications

Runtime Performances Benchmark for Knowledge Graph Embedding Methods

FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation

Exploring the acceleration of Nekbone on reconfigurable architectures

Automatic GPU optimization through higher-order functions in functional languages

Recent source codes

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Agentic Code Optimization via Compiler-LLM Cooperation

Device Virtual Machine (DVM)

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Most viewed papers (last 30 days)