high performance computing on graphics processing units: hgpu.org

hgpu.org » nVidia

Forecasting time series with constraints

Nathan Doumèche, Francis Bach, Éloi Bedek, Gérard Biau, Claire Boyer, Yannig Goude

View

Download (PDF)

Source codes

Tags: AI, Benchmarking, Computer science, Linear Algebra, Machine learning, nVidia, nVidia L4, Package

February 24, 2025 by hgpu

Evaluating the Performance of the DeepSeek Model in Confidential Computing Environment

Ben Dong, Qian Wang

View

Download (PDF)

Tags: Benchmarking, Cloud, Computer science, LLM, nVidia, nVidia A100, Performance, Security

February 24, 2025 by hgpu

The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition

Robert Tjarko Lange, Aaditya Prasad, Qi Sun, Maxence Faldor, Yujin Tang, David Ha

View

Download (PDF)

Source codes

Tags: AI, Computer science, CUDA, LLM, nVidia, nVidia H100, Package, Performance

February 24, 2025 by hgpu

cuSZp2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression Ratio

Yafan Huang, Sheng Di, Guanpeng Li, Franck Cappello

View

Download (PDF)

Source codes

Tags: Compression, Computer science, CUDA, nVidia, nVidia A100, nVidia GeForce RTX 3080, nVidia GeForce RTX 3090, Package, PTX

February 16, 2025 by hgpu

Leveraging LLVM OpenMP GPU Offload Optimizations for Kokkos Applications

Rahulkumar Gayatri, Shilei Tian, Stephen Olivier, Johannes Doerfert, Eric Wright

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI250X, ATI, Computer science, CUDA, HIP, MPI, nVidia, nVidia A100, OpenMP, Package, performance portability

February 16, 2025 by hgpu

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Heejun Lee, Geon Park, Jaduk Suh, Sung Ju Hwang

View

Download (PDF)

Source codes

Tags: Computer science, LLM, NLP, nVidia, nVidia GeForce RTX 4090, Package

February 16, 2025 by hgpu

Teaching An Old Dog New Tricks: Porting Legacy Code to Heterogeneous Compute Architectures With Automated Code Translation

Nicolas Nytko, Andrew Reisner, J. David Moulton, Luke N. Olson, Matthew West

View

Download (PDF)

Tags: Code generation, Computer science, CUDA, Fortran, Heterogeneous systems, MPI, nVidia, nVidia A100, OpenCL

February 16, 2025 by hgpu

Vortex: Overcoming Memory Capacity Limitations in GPU-Accelerated Large-Scale Data Analytics

Yichao Yuan, Advait Iyer, Lin Ma, Nishil Talati

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI100, ATI, Computer science, CUDA, Databases, HIP, nVidia, nVidia A40, Package

February 16, 2025 by hgpu

Optimizing the optimizer increasing performance efficiency of modern compilers

Hafsah Shahzad

View

Download (PDF)

Tags: Compilers, Computer science, FPGA, HLS, Intel, nVidia, nVidia GeForce RTX 4070, Thesis

February 10, 2025 by hgpu

Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs

Youhe Jiang, Fangcheng Fu, Xiaozhe Yao, Guoliang He, Xupeng Miao, Ana Klimovic, Bin Cui, Binhang Yuan, Eiko Yoneki

View

Download (PDF)

Source codes

Tags: Benchmarking, Cloud, Computer science, Heterogeneous systems, LLM, nVidia, nVidia GeForce RTX 4090, nVidia H100, nVidia RTX A6000, Package

February 10, 2025 by hgpu

Towards autonomous resource management: Deep learning prediction of CPU-GPU load balancing

Inigo Gabirondo Lopez

View

Download (PDF)

Tags: AMD Radeon HD 7970, Artificial intelligence, ATI, Computer science, Deep learning, Heterogeneous systems, load balancing, nVidia, nVidia GeForce GTX 970, OpenCL

February 10, 2025 by hgpu

Ilargi: a GPU Compatible Factorized ML Model Training Framework

Wenbo Sun, Rihan Hai

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Linear Algebra, Machine learning, nVidia, nVidia A40, Package

February 10, 2025 by hgpu

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

Agentic Code Optimization via Compiler-LLM Cooperation

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Forecasting time series with constraints

The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition

cuSZp2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression Ratio

Leveraging LLVM OpenMP GPU Offload Optimizations for Kokkos Applications

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Teaching An Old Dog New Tricks: Porting Legacy Code to Heterogeneous Compute Architectures With Automated Code Translation

Vortex: Overcoming Memory Capacity Limitations in GPU-Accelerated Large-Scale Data Analytics

Optimizing the optimizer increasing performance efficiency of modern compilers

Towards autonomous resource management: Deep learning prediction of CPU-GPU load balancing

Ilargi: a GPU Compatible Factorized ML Model Training Framework

Recent source codes

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Agentic Code Optimization via Compiler-LLM Cooperation

Most viewed papers (last 30 days)