high performance computing on graphics processing units: hgpu.org

Applications

hgpu.org » Applications

Exploring data flow design and vectorization with oneAPI for streaming applications on CPU+GPU

Cristian Campos, Rafael Asenjo, Angeles Navarro

View

Download (PDF)

Source codes

Tags: Computer science, Data parallelism, Heterogeneous systems, Intel, Intel UHD 770, oneAPI, Package, SYCL

January 27, 2025 by hgpu

Adaptive Optimization Techniques for High-Performance Computing

Gulsum Gudukbay Akbulut

View

Download (PDF)

Tags: Computer science, CUDA, Deep learning, HPC, Machine learning, nVidia, Optimization, Performance, Tesla K80, Thesis

January 27, 2025 by hgpu

Dissecting the NVIDIA Hopper Architecture through Microbenchmarking and Multiple Level Analysis

Weile Luo, Ruibo Fan, Zeyu Li, Dayou Du, Hongyuan Liu, Qiang Wang, Xiaowen Chu

View

Download (PDF)

Tags: Benchmarking, Computer science, CUDA, Hardware Architecture, nVidia, nVidia A100, nVidia GeForce RTX 4090, nVidia H800, Performance, PTX

January 27, 2025 by hgpu

Code Generation for Cryptographic Kernels using Multi-word Modular Arithmetic on GPU

Naifeng Zhang, Franz Franchetti

View

Download (PDF)

Source codes

Tags: Code generation, Computer science, CUDA, Linear Algebra, Modular arithmetic, nVidia, nVidia GeForce RTX 4090, nVidia H100, nVidia V100, Package, Security

January 20, 2025 by hgpu

GSParLib: A multi-level programming interface unifying OpenCL and CUDA for expressing stream and data parallelism

Dinei A. Rockenbach, Gabriell Araujo, Dalvan Griebler, Luiz Gustavo Fernandes

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Data parallelism, nVidia, nVidia GeForce RTX 3090, OpenCL, OpenMP, Package

January 20, 2025 by hgpu

Keras Sig: Efficient Path Signature Computation on GPU in Keras 3

Rémi Genet, Hugo Inzirillo

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Deep learning, Machine learning, nVidia, nVidia GeForce RTX 4090, Package, Python, TensorFlow

January 20, 2025 by hgpu

A User’s Guide to KSig: GPU-Accelerated Computation of the Signature Kernel

Csaba Tóth, Danilo Jr Dela Cruz, Harald Oberhauser

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Machine learning, nVidia, nVidia A100, Package, Python

January 20, 2025 by hgpu

Boosting Performance of Iterative Applications on GPUs: Kernel Batching with CUDA Graphs

Jonah Ekelund, Stefano Markidis, Ivy Peng

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, FDTD, HIP, nVidia, nVidia A100, nVidia H100, Package, Performance

January 20, 2025 by hgpu

CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection

Ruijun Feng, Hammond Pearce, Pietro Liguori, Yulei Sui

View

Download (PDF)

Tags: Computer science, CUDA, LLM, nVidia, nVidia H100, Python, PyTorch, Security

January 13, 2025 by hgpu

SCALE-Ahead-Of-Time Compilation of CUDA for AMD GPUs

Manos Pavlidakis, Chris Kitching, Nicholas Tomlinson, Michael Søndergaard

View

Download (PDF)

Tags: ATI, Compilers, Computer science, CUDA, HIP, nVidia, Package, Portability, PTX

January 13, 2025 by hgpu

Validation of GPU Computation in Decentralized, Trustless Networks

Eric Boniardi, Stanley Bishop, Alison Haire

View

Download (PDF)

Tags: Computer science, Distributed computing

January 13, 2025 by hgpu

LeetDecoding: A PyTorch Library for Exponentially Decaying Causal Linear Attention with CUDA Implementations

Jiaping Wang, Simiao Zhang, Qiao-Chu He, Yifan Chen

View

Download (PDF)

Source codes

Tags: Benchmarking, Computer science, CUDA, LLM, Machine learning, nVidia, nVidia A100, nVidia RTX A6000, Package, Python, PyTorch

January 13, 2025 by hgpu

CUDAnalyst (CUDA + Analyst)

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation

CodegenBench

CodegenBench: Can LLMs Write Efficient Code Across Architectures?

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

Agentic Code Optimization via Compiler-LLM Cooperation

Device Virtual Machine (DVM)

DVM: Real-Time Kernel Generation for Dynamic AI Models

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Applications

Exploring data flow design and vectorization with oneAPI for streaming applications on CPU+GPU

Adaptive Optimization Techniques for High-Performance Computing

Dissecting the NVIDIA Hopper Architecture through Microbenchmarking and Multiple Level Analysis

Code Generation for Cryptographic Kernels using Multi-word Modular Arithmetic on GPU

GSParLib: A multi-level programming interface unifying OpenCL and CUDA for expressing stream and data parallelism

Keras Sig: Efficient Path Signature Computation on GPU in Keras 3

A User’s Guide to KSig: GPU-Accelerated Computation of the Signature Kernel

Boosting Performance of Iterative Applications on GPUs: Kernel Batching with CUDA Graphs

CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection

SCALE-Ahead-Of-Time Compilation of CUDA for AMD GPUs

Validation of GPU Computation in Decentralized, Trustless Networks

LeetDecoding: A PyTorch Library for Exponentially Decaying Causal Linear Attention with CUDA Implementations

Recent source codes

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Agentic Code Optimization via Compiler-LLM Cooperation

Device Virtual Machine (DVM)

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Most viewed papers (last 30 days)