high performance computing on graphics processing units: hgpu.org

hgpu.org » Performance

CrystalGPU: Transparent and Efficient Utilization of GPU Power

Abdullah Gharaibeh, Samer Al-Kiswany, Matei Ripeanu

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, nVidia, nVidia GeForce 9800 GX2, Package, Performance

October 28, 2010 by hgpu

An integrated GPU power and performance model

Sunpyo Hong, Hyesoon Kim

View

Download (PDF)

Tags: Analytical model, Computer science, CUDA, Energy-efficient computing, nVidia, nVidia GeForce GTX 280, Performance

October 28, 2010 by hgpu

GPU as a General Purpose Computing Resource

Qihang Huang, Zhiyi Huang, Paul Werstein, Martin Purvis

Tags: Computer science, CUDA, nVidia, Performance, Programming techniques

October 27, 2010 by hgpu

Cache and bandwidth aware matrix multiplication on the GPU

J. Hall, N. Carr, J. Hart

View

Download (PDF)

Tags: Algorithms, Computer science, Linear Algebra, Performance

October 27, 2010 by hgpu

Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi, Sam S. Stone, David B. Kirk, Wen-mei W. Hwu

View

Download (PDF)

Tags: Computer science, CUDA, High-level Languages, nVidia, nVidia GeForce 8800 GTX, Performance, Programming techniques

October 27, 2010 by hgpu

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Victor W. Lee,Changkyu Kim,Jatin Chhugani,Michael Deisher,Daehyun Kim,Anthony D. Nguyen,Nadathur Satish,Mikhail Smelyanskiy,Srinivas Chennupaty,Per Hammarlund,Ronak Singhal,Pradeep Dubey

View

Download (PDF)

Tags: Algorithm optimization, Computer science, nVidia, nVidia GeForce GTX 280, Performance

October 27, 2010 by hgpu

On the limits of GPU acceleration

Richard Vuduc, Aparna Chandramowlishwaran, Jee Choi, Murat Guney, Aashay Shringarpure

View

Download (PDF)

Tags: Computer science, nVidia, nVidia GeForce GTX 285, Performance, Tesla C1060, Tesla S1070

October 27, 2010 by hgpu

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

CrystalGPU: Transparent and Efficient Utilization of GPU Power

An integrated GPU power and performance model

GPU as a General Purpose Computing Resource

Cache and bandwidth aware matrix multiplication on the GPU

Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

On the limits of GPU acceleration

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)