high performance computing on graphics processing units: hgpu.org

hgpu.org » Code generation

Scope is all you need: Transforming LLMs for HPC Code

Tal Kadosh, Niranjan Hasabnis, Vy A. Vo, Nadav Schneider, Neva Krien, Abdul Wasay, Nesreen Ahmed, Ted Willke, Guy Tamir, Yuval Pinter, Timothy Mattson, Gal Oren

View

Download (PDF)

Source codes

Tags: Code generation, Computer science, Deep learning, HPC, nVidia, nVidia A40, nVidia H100, OpenMP, Package

September 6, 2023 by hgpu

HPAC-Offload: Accelerating HPC Applications with Portable Approximate Computing on the GPU

Zane Fink, Konstantinos Parasyris, Giorgis Georgakoudis, Harshitha Menon

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI250X, ATI, Code generation, Computer science, HPC, LLVM, nVidia, nVidia V100, OpenMP, Package

September 6, 2023 by hgpu

Generating Parallel OpenCL and OpenMP Programs from Dataflow Graphs

Evelyn Borth

View

Download (PDF)

Tags: Benchmarking, Code generation, Computer science, OpenCL, OpenMP, Thesis

August 20, 2023 by hgpu

ProtoX: A First Look

Het Mankad, Sanil Rao, Brian Van Straalen, Phillip Colella, Franz Franchetti

View

Download (PDF)

Tags: Code generation, Computer science, Differential equations, Laplace and Poisson equation, Mathematical Software, Partial differential equations, PDEs, Poisson equation, Stencil computation

July 24, 2023 by hgpu

Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks

Mingyu Liang, Wenyin Fu, Louis Feng, Zhongyi Lin, Pavani Panakanti, Shengbao Zheng, Srinivas Sridharan, Christina Delimitrou

View

Download (PDF)

Source codes

Tags: AI, Benchmarking, Code generation, Computer science, CUDA, nVidia, Package, Performance, PyTorch, Tesla A100, Tesla V100

July 16, 2023 by hgpu

Modeling Parallel Programs using Large Language Models

Daniel Nichols, Aniruddha Marathe, Harshitha Menon, Todd Gamblin, Abhinav Bhatele

View

Download (PDF)

Tags: Code generation, Computer science, HPC, MPI, nVidia, nVidia A100, OpenMP

July 9, 2023 by hgpu

Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel Generation

William F. Godoy, Pedro Valero-Lara, Keita Teranishi, Prasanna Balaprakash, Jeffrey S. Vetter

View

Download (PDF)

Source codes

Tags: AI, Artificial intelligence, Benchmarking, Code generation, Computer science, CUDA, Fortran, HPC, Julia, nVidia, OpenACC, OpenMP, Package, Python

July 2, 2023 by hgpu

ACC Saturator: Automatic Kernel Optimization for Directive-Based GPU Code

Kazuaki Matsumura, Simon Garcia De Gonzalo, Antonio J. Peña

View

Download (PDF)

Tags: Algorithms, Benchmarking, Code generation, Compilers, Computer science, nVidia, OpenACC, OpenMP, Tesla A100

June 25, 2023 by hgpu

ACRoBat: Optimizing Auto-batching of Dynamic Deep Learning at Compile Time

Pratik Fegade, Tianqi Chen, Phillip B. Gibbons, Todd C. Mowry

View

Download (PDF)

Tags: Code generation, Computer science, CUDA, Deep learning, nVidia, nVidia GeForce RTX 3070

May 28, 2023 by hgpu

Experiences in Building a Composable and Functional API for Runtime SPIR-V Code Generation

Juan Fumero, György Rethy, Athanasios Stratikopoulos, Nikos Foutris, Christos Kotselidis

View

Download (PDF)

Source codes

Tags: Code generation, Computer science, Heterogeneous systems, Java, OpenCL, Package

May 21, 2023 by hgpu

Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs

Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Bryan M. Wong, Zizhong Chen

View

Download (PDF)

Source codes

Tags: Code generation, Computer science, CUDA, GEMM, Linear Algebra, Matrix multiplication, nVidia, nVidia A100, Package, Performance, Reliability, Tesla T4

May 7, 2023 by hgpu

Fuzzing Loop Optimizations in Compilers for C++ and Data-Parallel Languages

Vsevolod Livinskii, Dmitry Babokin, John Regehr

View

Download (PDF)

Source codes

Tags: Code generation, Compilers, Computer science, oneAPI, Package, SYCL

April 23, 2023 by hgpu

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

SimSYCL: A SYCL Implementation Targeting Development, Debugging, Simulation and Conformance

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

OpenMC Monte Carlo Code

Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

Polygeist: C/C++ frontend for MLIR

Retargeting and Respecializing GPU Workloads for Performance Portability

Parallel Gaussian process with kernel approximation in CUDA

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Scope is all you need: Transforming LLMs for HPC Code

HPAC-Offload: Accelerating HPC Applications with Portable Approximate Computing on the GPU

Generating Parallel OpenCL and OpenMP Programs from Dataflow Graphs

ProtoX: A First Look

Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks

Modeling Parallel Programs using Large Language Models

Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel Generation

ACC Saturator: Automatic Kernel Optimization for Directive-Based GPU Code

ACRoBat: Optimizing Auto-batching of Dynamic Deep Learning at Compile Time

Experiences in Building a Composable and Functional API for Runtime SPIR-V Code Generation

Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs

Fuzzing Loop Optimizations in Compilers for C++ and Data-Parallel Languages

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)