high performance computing on graphics processing units: hgpu.org

Programming

hgpu.org » Programming

APPy: Annotated Parallelism for Python on GPUs

Tong Zhou, Jun Shirako, Vivek Sarkar

View

Download (PDF)

Tags: Code generation, Compilers, Computer science, CUDA, Machine learning, nVidia, nVidia GeForce RTX 3090, Python

February 25, 2024 by hgpu

Analyzing GPU Performance in Virtualized Environments: A Case Study

Adel Belkhiri, Michel Dagenais

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Java, nVidia, OpenCL, Package, Performance, Virtualization

February 25, 2024 by hgpu

Green AI: A Preliminary Empirical Study on Energy Consumption in DL Models Across Different Runtime Infrastructures

Negar Alizadeh, Fernando Castor

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Deep learning, Energy-efficient computing, nVidia, nVidia GeForce RTX 3070, Package, PyTorch, TensorFlow

February 25, 2024 by hgpu

Benchmarking and Dissecting the Nvidia Hopper GPU Architecture

Weile Luo, Ruibo Fan, Zeyu Li, Dayou Du, Qiang Wang, Xiaowen Chu

View

Download (PDF)

Tags: Artificial intelligence, Benchmarking, Computer science, CUDA, Deep learning, nVidia, nVidia A100, nVidia GeForce RTX 4090, nVidia H800, Performance, PTX

February 25, 2024 by hgpu

Assessing opportunities of SYCL for biological sequence alignment on GPU-based systems

Manuel Costanzo, Enzo Rucci, Carlos García-Sanchez, Marcelo Naiouf, Manuel Prieto-Matías

View

Download (PDF)

Source codes

Tags: AMD Radeon RX 6700 XT, AMD Radeon RX Vega 6, ATI, Bioinformatics, Biology, Computational biology, CUDA, Heterogeneous systems, Intel Arc A770, Intel UHD 630, nVidia, nVidia GeForce GTX 1080, nVidia GeForce GTX 980, nVidia GeForce RTX 2070, nVidia GeForce RTX 3090, nVidia V100, oneAPI, Package, Sequence alignment, SYCL

February 25, 2024 by hgpu

TransAxx: Efficient Transformers with Approximate Computing

Dimitrios Danopoulos, Georgios Zervakis, Dimitrios Soudris, Jörg Henkel

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Machine learning, Neural networks, nVidia, nVidia V100, Package, PyTorch

February 18, 2024 by hgpu

An Evaluative Comparison of Performance Portability across GPU Programming Models

Joshua H. Davis, Pranav Sivaraman, Isaac Minn, Konstantinos Parasyris, Harshitha Menon, Giorgis Georgakoudis, Abhinav Bhatele

View

Download (PDF)

Tags: AMD Radeon Instinct MI250X, AMD Radeon Instinct Mi50, ATI, Computer science, CUDA, Heterogeneous systems, HIP, MPI, nVidia, nVidia V100, OpenACC, OpenMP, Performance, performance portability, SYCL

February 18, 2024 by hgpu

pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ Parallel STL Implementations

Ruben Laso, Diego Krupitza, Sascha Hunold

View

Download (PDF)

Tags: Benchmarking, Computer science, CUDA, nVidia, nVidia Ampere A2, OpenMP, performance portability, Tesla P4

February 18, 2024 by hgpu

QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

Taesu Kim, Jongho Lee, Daehyun Ahn, Sarang Kim, Jiwoong Choi, Minkyu Kim, Hyungjun Kim

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Deep learning, Machine learning, Matrix multiplication, Mixed precision, nVidia, nVidia A100, nVidia GeForce RTX 4090, nVidia RTX A6000, Package

February 18, 2024 by hgpu

Training DNN Models over Heterogeneous Clusters with Optimal Performance

Chengyi Nie, Jessica Maghakian, Zhenhua Liu

View

Download (PDF)

Tags: Computer science, CUDA, Deep learning, Heterogeneous systems, load balancing, Neural networks, nVidia, nVidia RTX 6000 Ada

February 12, 2024 by hgpu

Out of kernel tuning and optimizations for portable large-scale docking experiments on GPUs

Gianmarco Accordi, Davide Gadioli, Emanele Vitali, Luigi Crisci, Biagio Cosenza, Andrea Beccari, Gianluca Palermo

View

Download (PDF)

Tags: Computer science, CUDA, HPC, molecular docking, nVidia, nVidia A100, oneAPI, Performance, SYCL

February 12, 2024 by hgpu

Gallatin: A General-Purpose GPU Memory Manager

Hunter McCoy, Prashant Pandey

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, HPC, Memory, nVidia, nVidia A40, Package

February 4, 2024 by hgpu

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Programming

APPy: Annotated Parallelism for Python on GPUs

Analyzing GPU Performance in Virtualized Environments: A Case Study

Green AI: A Preliminary Empirical Study on Energy Consumption in DL Models Across Different Runtime Infrastructures

Benchmarking and Dissecting the Nvidia Hopper GPU Architecture

Assessing opportunities of SYCL for biological sequence alignment on GPU-based systems

TransAxx: Efficient Transformers with Approximate Computing

An Evaluative Comparison of Performance Portability across GPU Programming Models

pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ Parallel STL Implementations

QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

Training DNN Models over Heterogeneous Clusters with Optimal Performance

Out of kernel tuning and optimizations for portable large-scale docking experiments on GPUs

Gallatin: A General-Purpose GPU Memory Manager

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)