high performance computing on graphics processing units: hgpu.org

Applications

hgpu.org » Applications » Computer science

Compiler Support for Speculation in Decoupled Access/Execute Architectures

Robert Szafarczyk, Syed Waqar Nabi, Wim Vanderbauwhede

View

Download (PDF)

Tags: Compilers, Computer science, HLS, Performance, Prefetch

February 10, 2025 by hgpu

Towards autonomous resource management: Deep learning prediction of CPU-GPU load balancing

Inigo Gabirondo Lopez

View

Download (PDF)

Tags: AMD Radeon HD 7970, Artificial intelligence, ATI, Computer science, Deep learning, Heterogeneous systems, load balancing, nVidia, nVidia GeForce GTX 970, OpenCL

February 10, 2025 by hgpu

Ilargi: a GPU Compatible Factorized ML Model Training Framework

Wenbo Sun, Rihan Hai

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Linear Algebra, Machine learning, nVidia, nVidia A40, Package

February 10, 2025 by hgpu

CPU-GPU co-execution through the exploitation of hybrid technologies via SYCL

Nozal Raúl, Jose Luis Bosque

View

Download (PDF)

Tags: Computer science, CUDA, Heterogeneous systems, Hybrid computing, LLVM, load balancing, nVidia, nVidia GeForce GT 1030, oneAPI, OpenCL, performance portability, SYCL

February 3, 2025 by hgpu

Modernization and Optimization of MPI Codes

Tim Jammer

View

Download (PDF)

Source codes

Tags: Benchmarking, Computer science, HPC, MPI, OpenMP, Package, Performance, Thesis

February 3, 2025 by hgpu

On the Partitioning of GPU Power among Multi-Instances

Tirth Vamja, Kaustabha Ray, Felix George, UmaMaheswari C Devi

View

Download (PDF)

Tags: Computer science, CUDA, Energy-efficient computing, Machine learning, Matrix multiplication, nVidia, nVidia A100, nVidia V100, Performance

February 3, 2025 by hgpu

Fully-Automated Code Generation for Efficient Computation of Sparse Matrix Permanents on GPUs

Deniz Elbek, Kamer Kaya

View

Download (PDF)

Tags: Code generation, Computer science, CUDA, nVidia, nVidia A100, nVidia Quadro GV100, PTX, Sparse matrix

February 3, 2025 by hgpu

Profiling Apple Silicon Performance for ML Training

Dahua Feng, Zhiming Xu, Rongxiang Wang, Felix Xiaozhu Lin

View

Download (PDF)

Tags: AI, Apple M2 Max, Apple M2 Pro, Apple M2 Ultra, Computer science, CUDA, Linear Algebra, LLM, Machine learning, nVidia, nVidia GeForce RTX 4090, nVidia GeFroce RTX 2080 Ti, nVidia Quadro RTX 4000, nVidia RTX A6000, Performance, PyTorch

February 3, 2025 by hgpu

Column-Oriented Datalog on the GPU

Yihao Sun, Sidharth Kumar, Thomas Gilray, Kristopher Micinski

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Databases, nVidia, nVidia H100, Package

January 27, 2025 by hgpu

Good things come in small packages: Should we adopt Lite-GPUs in AI infrastructure?

Burcu Canakci, Junyi Liu, Xingbo Wu, Nathanaël Cheriere, Paolo Costa, Sergey Legtchenko, Dushyanth Narayanan, Ant Rowstron

View

Download (PDF)

Tags: AI, Computer science, Hardware Architecture

January 27, 2025 by hgpu

Exploring data flow design and vectorization with oneAPI for streaming applications on CPU+GPU

Cristian Campos, Rafael Asenjo, Angeles Navarro

View

Download (PDF)

Source codes

Tags: Computer science, Data parallelism, Heterogeneous systems, Intel, Intel UHD 770, oneAPI, Package, SYCL

January 27, 2025 by hgpu

Adaptive Optimization Techniques for High-Performance Computing

Gulsum Gudukbay Akbulut

View

Download (PDF)

Tags: Computer science, CUDA, Deep learning, HPC, Machine learning, nVidia, Optimization, Performance, Tesla K80, Thesis

January 27, 2025 by hgpu

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Applications

Compiler Support for Speculation in Decoupled Access/Execute Architectures

Towards autonomous resource management: Deep learning prediction of CPU-GPU load balancing

Ilargi: a GPU Compatible Factorized ML Model Training Framework

CPU-GPU co-execution through the exploitation of hybrid technologies via SYCL

Modernization and Optimization of MPI Codes

On the Partitioning of GPU Power among Multi-Instances

Fully-Automated Code Generation for Efficient Computation of Sparse Matrix Permanents on GPUs

Profiling Apple Silicon Performance for ML Training

Column-Oriented Datalog on the GPU

Good things come in small packages: Should we adopt Lite-GPUs in AI infrastructure?

Exploring data flow design and vectorization with oneAPI for streaming applications on CPU+GPU

Adaptive Optimization Techniques for High-Performance Computing

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)