high performance computing on graphics processing units: hgpu.org

hgpu.org » Deep learning

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

Jiaqi Lv, Xufeng He, Yanchen Liu, Xu Dai, Yang Hu, Shouyi Yin

View

Download (PDF)

Source codes

Tags: AI, Benchmarking, Compilers, Computer science, CUDA, Deep learning, LLM, nVidia, nVidia A100, Package, performance portability

June 15, 2025 by hgpu

Efficient deep learning inference on end devices

Ehsan Aghapour

View

Download (PDF)

Source codes

Tags: Artificial intelligence, Computer science, Deep learning, Heterogeneous systems, OpenCL, Package, Thesis

May 4, 2025 by hgpu

DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training

Masahiro Tanaka, Du Li, Umesh Chand, Ali Zafar, Haiying Shen, Olatunji Ruwase

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Deep learning, Distributed computing, nVidia, nVidia H100, Package, Prefetch

April 27, 2025 by hgpu

Scalability Evaluation of HPC Multi-GPU Training for ECG-based LLMs

Dimitar Mileski, Nikola Petrovski, Marjan Gusev

View

Download (PDF)

Tags: Computer science, CUDA, Deep learning, HPC, LLM, nVidia, nVidia A100, PTX, PyTorch

April 13, 2025 by hgpu

GPU-centric Communication Schemes for HPC and ML Applications

Naveen Namashivayam

View

Download (PDF)

Tags: Computer science, Deep learning, Heterogeneous systems, HPC, Machine learning, MPI, survey

April 13, 2025 by hgpu

TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives

Size Zheng, Jin Fang, Xuegui Zheng, Qi Hou, Wenlei Bao, Ningxin Zheng, Ziheng Jiang, Dongyang Wang, Jianxi Ye, Haibin Lin, Li-Wen Chang, Xin Liu

View

Download (PDF)

Tags: Computer science, Deep learning, LLM, nVidia, nVidia H800, PTX

March 30, 2025 by hgpu

TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators

Jianling Li, Shangzhan Li, Zhenye Gao, Qi Shi, Yuxuan Li, Zefan Wang, Jiacheng Huang, Haojie Wang, Jianrong Wang, Xu Han, Zhiyuan Liu, Maosong Sun

View

Download (PDF)

Source codes

Tags: Benchmarking, Code generation, Computer science, CUDA, Deep learning, LLM, nVidia, nVidia A100, Package, Python

March 3, 2025 by hgpu

CRIUgpu: Transparent Checkpointing of GPU-Accelerated Workloads

Radostin Stoyanov, Viktória Spišaková, Jesus Ramos, Steven Gurfinkel, Andrei Vagin, Adrian Reber, Wesley Armour, Rodrigo Bruno

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI210, ATI, Computer science, CUDA, Deep learning, nVidia, nVidia A100, nVidia H100, nVidia RTX A6000, Package, ROCm

March 3, 2025 by hgpu

Towards autonomous resource management: Deep learning prediction of CPU-GPU load balancing

Inigo Gabirondo Lopez

View

Download (PDF)

Tags: AMD Radeon HD 7970, Artificial intelligence, ATI, Computer science, Deep learning, Heterogeneous systems, load balancing, nVidia, nVidia GeForce GTX 970, OpenCL

February 10, 2025 by hgpu

Adaptive Optimization Techniques for High-Performance Computing

Gulsum Gudukbay Akbulut

View

Download (PDF)

Tags: Computer science, CUDA, Deep learning, HPC, Machine learning, nVidia, Optimization, Performance, Tesla K80, Thesis

January 27, 2025 by hgpu

Keras Sig: Efficient Path Signature Computation on GPU in Keras 3

Rémi Genet, Hugo Inzirillo

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Deep learning, Machine learning, nVidia, nVidia GeForce RTX 4090, Package, Python, TensorFlow

January 20, 2025 by hgpu

Scalable Access-Pattern Aware I/O Acceleration and Multi-Tiered Data Management for HPC and AI Workloads

Avinash Maurya

View

Download (PDF)

Source codes

Tags: AI, Artificial intelligence, Computer science, CUDA, Deep learning, HPC, nVidia, nVidia DGX-A100, Package, Thesis

December 29, 2024 by hgpu

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

No More Shading Languages: Compiling C++ to Vulkan Shaders

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

Efficient deep learning inference on end devices

DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training

Scalability Evaluation of HPC Multi-GPU Training for ECG-based LLMs

GPU-centric Communication Schemes for HPC and ML Applications

TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives

TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators

CRIUgpu: Transparent Checkpointing of GPU-Accelerated Workloads

Towards autonomous resource management: Deep learning prediction of CPU-GPU load balancing

Adaptive Optimization Techniques for High-Performance Computing

Keras Sig: Efficient Path Signature Computation on GPU in Keras 3

Scalable Access-Pattern Aware I/O Acceleration and Multi-Tiered Data Management for HPC and AI Workloads

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)