high performance computing on graphics processing units: hgpu.org

Xiaojie Wu, Qiming Sun, Zhichen Pu, Tianze Zheng, Wenzhi Ma, Wen Yan, Xia Yu, Zhengxiao Wu, Mian Huo, Xiang Li, Weiluo Ren, Sheng Gong, Yumin Zhang, Weihao Gao

View

Download (PDF)

Source codes

Tags: Chemical Physics, Chemistry, Computational Physics, CUDA, nVidia, nVidia A100, Package, Python, Quantum Physics

April 21, 2024 by hgpu

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

Erik D. Huckvale, Hunter N.B. Moseley

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, nVidia, Package, Performance, Profiling, Python

April 7, 2024 by hgpu

Using Intel oneAPI for Multi-hybrid Acceleration Programming with GPU and FPGA Coupling

Wentao Liang, Norihisa Fujita, Ryohei Kobayashi, Taisuke Boku

View

Download (PDF)

Tags: Computer science, CUDA, FPGA, Heterogeneous systems, nVidia, oneAPI, OpenCL, SYCL, Tesla V100

April 7, 2024 by hgpu

Retargeting and Respecializing GPU Workloads for Performance Portability

Ivan R. Ivanov, Oleksandr Zinenko, Jens Domke, Toshio Endo, William S. Moses

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI210, AMD Radeon RX 6800, ATI, Computer science, CUDA, HIP, HPC, nVidia, nVidia A100, nVidia RTX A4000, Package, performance portability

March 24, 2024 by hgpu

Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

John Tramm, Paul Romano, Patrick Shriwise, Amanda Lund, Johannes Doerfert, Patrick Steinbrecher, Andrew Siegel, Gavin Ridley

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI250X, ATI, Computer science, CUDA, Intel, Intel Data Center GPU Max 1550, Intel Ponte Vecchio Max 1100, nVidia, nVidia A100, OpenMP, Package, performance portability

March 24, 2024 by hgpu

Parallel Gaussian process with kernel approximation in CUDA

Davide Carminati

View

Download (PDF)

Source codes

Tags: Benchmarking, Computer science, CUDA, Linear Algebra, nVidia, nVidia GeForce GTX 1050, nVidia GeForce RTX 2080, Package

March 24, 2024 by hgpu

Cost-Effective Methodology for Complex Tuning Searches in HPC: Navigating Interdependencies and Dimensionality

Adrian Perez Dieguez, Min Choi, Mahmut Okyay, Mauro Del Ben, Bryan M. Wong, Khaled Z. Ibrahim

View

Download (PDF)

Tags: Computer science, CUDA, HPC, MPI, nVidia, nVidia A100, OpenMP, Performance

March 18, 2024 by hgpu

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Automated Deep Learning Optimization via DSL-Based Source Code Transformation

Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps

CuPBoP: Making CUDA a Portable Language

Automatic BLAS Offloading on Unified Memory Architecture: A Study on NVIDIA Grace-Hopper

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

Python-Based Quantum Chemistry Calculations with GPU Acceleration

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

Using Intel oneAPI for Multi-hybrid Acceleration Programming with GPU and FPGA Coupling

Retargeting and Respecializing GPU Workloads for Performance Portability

Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

Parallel Gaussian process with kernel approximation in CUDA

Cost-Effective Methodology for Complex Tuning Searches in HPC: Navigating Interdependencies and Dimensionality

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)