high performance computing on graphics processing units: hgpu.org

Packages

hgpu.org » Applications

Compiler Technologies in Deep Learning Co-Design: A Survey

Hongbin Zhang, Mingjie Xing, Yanjun Wu, Chen Zhao

View

Download (PDF)

Source codes

Tags: Compielrs, Computer science, Deep learning, HLS, OpenCL, Package, survey

June 4, 2023 by hgpu

Implementation Techniques for SPMD Kernels on CPUs

Joachim Meyer, Aksel Alpay, Sebastian Hack, Holger Fröning, Vincent Heuveline

View

Download (PDF)

Source codes

Tags: Compilers, Computer science, CUDA, Heterogeneous systems, nVidia, OpenCL, Package, performance portability

June 4, 2023 by hgpu

Genomics-GPU: A Benchmark Suite for GPU-accelerated Genome Analysis

Zhuren Liu, Shouzhe Zhang, Justin Garrigus, Hui Zhao

View

Download (PDF)

Source codes

Tags: Benchmarking, Bioinformatics, Biology, Computer science, CUDA, Genomics, nVidia, nVidia GeForce RTX 3070, Package

May 28, 2023 by hgpu

Experiences Migrating CUDA to SYCL: A Molecular Docking Case Study

Leonardo Solis-Vasquez, Edward Mascarenhas, Andreas Koch

View

Download (PDF)

Source codes

Tags: Chemistry, Computer science, CUDA, molecular docking, nVidia, nVidia A100, oneAPI, Package, SYCL

May 28, 2023 by hgpu

PyTorch Hyperparameter Tuning – A Tutorial for spotPython

Thomas Bartz-Beielstein

View

Download (PDF)

Source codes

Tags: Computer science, Deep learning, Package, Python, PyTorch, Tutorial

May 28, 2023 by hgpu

An Asynchronous Dataflow-Driven Execution Model For Distributed Accelerator Computing

Philip Salzmann, Fabian Knorr, Peter Thoman, Philipp Gschwandtner, Biagio Cosenza, Thomas Fahringer

View

Download (PDF)

Source codes

Tags: Benchmarking, Computer science, CUDA, GPU cluster, MPI, nVidia, nVidia V100, Package, SYCL

May 21, 2023 by hgpu

Dragon-Alpha&cu32: A Java-based Tensor Computing Framework With its High-Performance CUDA Library

Zhiyi Zhang, Pengfei Zhang, Qi Wang

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Deep learning, Heterogeneous systems, Java, nVidia, nVidia GeForce RTX 3060 Ti, Package

May 21, 2023 by hgpu

Experiences in Building a Composable and Functional API for Runtime SPIR-V Code Generation

Juan Fumero, György Rethy, Athanasios Stratikopoulos, Nikos Foutris, Christos Kotselidis

View

Download (PDF)

Source codes

Tags: Code generation, Computer science, Heterogeneous systems, Java, OpenCL, Package

May 21, 2023 by hgpu

TorchBench: Benchmarking PyTorch with High API Surface Coverage

Yueming Hao, Xu Zhao, Bin Bao, David Berard, Will Constable, Adnan Aziz, Xu Liu

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI210, ATI, Benchmarking, Computer science, Deep learning, nVidia, nVidia A100, Package, PyTorch

May 14, 2023 by hgpu

Prediction of Performance and Power Consumption of GPGPU Applications

Gargi Alavani, Santonu Sarkar

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Energy-efficient computing, Java, Machine learning, nVidia, Package, Performance, PTX, Tesla K20, Thesis

May 14, 2023 by hgpu

Descend: A Safe GPU Systems Programming Language

Bastian Köpcke, Sergei Gorlatch, Michel Steuwer

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, nVidia, OpenCL, Package, Programming Languages, Tesla P100

May 14, 2023 by hgpu

Dynamically Finding Optimal Kernel Launch Parameters for CUDA Programs

Taabish Jeshani

View

Download (PDF)

Source codes

Tags: Benchmarking, Computer science, CUDA, nVidia, nVidia GeForce RTX 2070, Package, Performance, Thesis

May 7, 2023 by hgpu

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Packages

Compiler Technologies in Deep Learning Co-Design: A Survey

Implementation Techniques for SPMD Kernels on CPUs

Genomics-GPU: A Benchmark Suite for GPU-accelerated Genome Analysis

Experiences Migrating CUDA to SYCL: A Molecular Docking Case Study

PyTorch Hyperparameter Tuning – A Tutorial for spotPython

An Asynchronous Dataflow-Driven Execution Model For Distributed Accelerator Computing

Dragon-Alpha&cu32: A Java-based Tensor Computing Framework With its High-Performance CUDA Library

Experiences in Building a Composable and Functional API for Runtime SPIR-V Code Generation

TorchBench: Benchmarking PyTorch with High API Surface Coverage

Prediction of Performance and Power Consumption of GPGPU Applications

Descend: A Safe GPU Systems Programming Language

Dynamically Finding Optimal Kernel Launch Parameters for CUDA Programs

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)