high performance computing on graphics processing units: hgpu.org

Qian Gong, Jieyang Chen, Ben Whitney, Xin Liang, Viktor Reshniak, Tania Banerjee, Jaemoon Lee, Anand Rangarajan, Lipeng Wan, Nicolas Vidal, Qing Liu, Ana Gainaru, Norbert Podhorszki, Richard Archibald, Sanjay Ranka, Scott Klasky

View

Download (PDF)

Source codes

Tags: Compression, Computer science, CUDA, HIP, HPC, Numerical Analysis, nVidia, nVidia V100, OpenMP, Package, SYCL

January 21, 2024 by hgpu

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Applications

Gallatin: A General-Purpose GPU Memory Manager

Deductive verification for SYCL

LeftoverLocals: Listening to LLM Responses Through Leaked GPU Local Memory

Towards a GPU-Parallelization of the neXtSIM-DG Dynamical Core

Assessing the Impact of Compiler Optimizations on GPUs Reliability

BANG: Billion-Scale Approximate Nearest Neighbor Search using a Single GPU

A Heterogeneous Inference Framework for a Deep Neural Network

A Survey on Hardware Accelerators for Large Language Models

swCUDA: Auto parallel code translation framework from CUDA to ATHREAD for new generation sunway supercomputer

Parallel and Heterogeneous Timing Analysis: Partition, Algorithm, and System

MGARD: A multigrid framework for high-performance, error-controlled data compression and refactoring

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)