high performance computing on graphics processing units: hgpu.org

hgpu.org » nVidia H100

Scope is all you need: Transforming LLMs for HPC Code

Tal Kadosh, Niranjan Hasabnis, Vy A. Vo, Nadav Schneider, Neva Krien, Abdul Wasay, Nesreen Ahmed, Ted Willke, Guy Tamir, Yuval Pinter, Timothy Mattson, Gal Oren

View

Download (PDF)

Source codes

Tags: Code generation, Computer science, Deep learning, HPC, nVidia, nVidia A40, nVidia H100, OpenMP, Package

September 6, 2023 by hgpu

Porting Batched Iterative Solvers onto Intel GPUs with SYCL

Phuong Nguyen, Pratik Nayak, Hartwig Anzt

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Intel, Intel Data Center GPU Max 1550, nVidia, nVidia A100, nVidia H100, Package, performance portability, Physics, SYCL

August 20, 2023 by hgpu

CUDAnalyst (CUDA + Analyst)

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation

CodegenBench

CodegenBench: Can LLMs Write Efficient Code Across Architectures?

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

Agentic Code Optimization via Compiler-LLM Cooperation

Device Virtual Machine (DVM)

DVM: Real-Time Kernel Generation for Dynamic AI Models

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Scope is all you need: Transforming LLMs for HPC Code

Porting Batched Iterative Solvers onto Intel GPUs with SYCL

Recent source codes

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Agentic Code Optimization via Compiler-LLM Cooperation

Device Virtual Machine (DVM)

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Most viewed papers (last 30 days)