high performance computing on graphics processing units: hgpu.org

Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Jinyang Liu, Yafan Huang, Ken Raffenetti, Hui Zhou, Kai Zhao, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur

View

Download (PDF)

Tags: Compression, Computer science, GPU cluster, MPI, nVidia, nVidia A100

August 13, 2023 by hgpu

Static and Dynamic Analyses for Efficient GPU Execution

Philip Munksgaard

View

Download (PDF)

Tags: AMD Radeon Instinct MI100, ATI, Benchmarking, Computer science, nVidia, nVidia A100, OpenCL, Performance, Thesis

August 13, 2023 by hgpu

Fast Knowledge Graph Completion using Graphics Processing Units

Chun-Hee Lee, Dong-oh Kang, Hwa Jeon Song

View

Download (PDF)

Tags: AI, Algorithms, Computer science, CUDA, Databases, Graph theory, nVidia, nVidia A100

July 30, 2023 by hgpu

A portable C++ library for memory and compute abstraction on multi-core CPUs and GPUs

Pietro Incardona, Aryaman Gupta, Serhii Yaskovets, Ivo F. Sbalzarini

View

Download (PDF)

Source codes

Tags: AMD RX Vega 64, ATI, Computer science, CUDA, nVidia, nVidia A100, nVidia GeForce RTX 3090, OpenACC, OpenCL, OpenMP, Package, Performance, performance portability, SYCL

July 30, 2023 by hgpu

qecGPT: decoding Quantum Error-correcting Codes with Generative Pre-trained Transformers

Hanyan Cao, Feng Pan, Yijia Wang, Pan Zhang

View

Download (PDF)

Source codes

Tags: Artificial intelligence, Computer science, Neural networks, nVidia, nVidia A100, Package

July 24, 2023 by hgpu

Maximizing Parallelism and GPU Utilization For Direct GPU Compilation Through Ensemble Execution

Shilei Tian, Barbara Chapman, Johannes Doerfert

View

Download (PDF)

Tags: Benchmarking, Computer science, CUDA, LLVM, nVidia, nVidia A100, OpenMP, Performance

July 24, 2023 by hgpu

Safe, Seamless, And Scalable Integration Of Asynchronous GPU Streams In PETSc

Jacob Faibussowitsch, Mark F. Adams, Richard Tran Mills, Stefano Zampini, Junchao Zhang

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, nVidia, nVidia A100, Package, Performance

July 9, 2023 by hgpu

CUDAnalyst (CUDA + Analyst)

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation

CodegenBench

CodegenBench: Can LLMs Write Efficient Code Across Architectures?

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

Agentic Code Optimization via Compiler-LLM Cooperation

Device Virtual Machine (DVM)

DVM: Real-Time Kernel Generation for Dynamic AI Models

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications

Performant low-order matrix-free finite element kernels on GPU architectures

Novel insights on atomic synchronization for sort-based group-by on GPUs

APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics

Porting Batched Iterative Solvers onto Intel GPUs with SYCL

gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters

Static and Dynamic Analyses for Efficient GPU Execution

Fast Knowledge Graph Completion using Graphics Processing Units

A portable C++ library for memory and compute abstraction on multi-core CPUs and GPUs

qecGPT: decoding Quantum Error-correcting Codes with Generative Pre-trained Transformers

Maximizing Parallelism and GPU Utilization For Direct GPU Compilation Through Ensemble Execution

Safe, Seamless, And Scalable Integration Of Asynchronous GPU Streams In PETSc

Recent source codes

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Agentic Code Optimization via Compiler-LLM Cooperation

Device Virtual Machine (DVM)

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Most viewed papers (last 30 days)