high performance computing on graphics processing units: hgpu.org

hgpu.org » Machine learning

QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

Taesu Kim, Jongho Lee, Daehyun Ahn, Sarang Kim, Jiwoong Choi, Minkyu Kim, Hyungjun Kim

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Deep learning, Machine learning, Matrix multiplication, Mixed precision, nVidia, nVidia A100, nVidia GeForce RTX 4090, nVidia RTX A6000, Package

February 18, 2024 by hgpu

Evaluating the Wide Area Classroom After 24,000 HPC Students

John Urbanic, Thomas Maiden, Valerie Rossi

View

Download (PDF)

Tags: Education, HPC, Machine learning, MPI, OpenACC, OpenMP, Physics

February 12, 2024 by hgpu

A Heterogeneous Inference Framework for a Deep Neural Network

Rafael Gadea-Gironés, José Luís Rocabado-Rocha, Jorge Fe, Jose M. Monzo

View

Download (PDF)

Tags: Artificial intelligence, Computer science, Deep learning, FPGA, Heterogeneous systems, HLS, Machine learning, Neural networks, OpenCL, PyTorch

January 28, 2024 by hgpu

Orion: Interference-aware, Fine-grained GPU Sharing for ML Applications

Foteini Strati, Xianzhe Ma, Ana Klimovic

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Deep learning, Machine learning, Neural networks, nVidia, nVidia A100, nVidia V100, Package, PyTorch

January 14, 2024 by hgpu

A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures

Fabrizio Ferrandi, Serena Curzel, Leandro Fiorin, Daniele Ielmini, Cristina Silvano, Francesco Conti, Alessio Burrello, Francesco Barchi, Luca Benini, Luciano Lavagno, Teodoro Urso, Enrico Calore, Sebastiano Fabio Schifano, Cristian Zambelli, Maurizio Palesi, Giuseppe Ascia, Enrico Russo, Nicola Petra, Davide De Caro, Gennaro Di Meo, Valeria Cardellini, Salvatore Filippone, Francesco Lo Presti, Francesco Silvestri, Paolo Palazzari, Stefania Perri

View

Download (PDF)

Tags: AI, Artificial intelligence, Computer science, CUDA, Deep learning, Design space exploration, Hardware Architecture, Heterogeneous systems, Machine learning, Neural networks, nVidia, nVidia H100, OpenCL, survey

December 3, 2023 by hgpu

Creating a Dataset for High-Performance Computing Code Translation using LLMs: A Bridge Between OpenMP Fortran and C+

Bin Lei, Caiwen Ding, Le Chen, Pei-Hung Lin, Chunhua Liao

View

Download (PDF)

Source codes

Tags: Benchmarking, Computer science, CUDA, Fortran, Machine learning, nVidia, nVidia RTX A6000, OpenMP, Package

November 19, 2023 by hgpu

Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU

Mohammad Zubair, Christoph Bauinger

View

Download (PDF)

Tags: Computer science, CUDA, Deep learning, Intel, Intel Data Center GPU Max 1550, Linear Algebra, Machine learning, Mathematical Software, Matrix multiplication, nVidia, nVidia V100, Sparse matrix, SYCL

November 5, 2023 by hgpu

Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies

Adrian Perez Dieguez, Margarita Amor Lopez

View

Download (PDF)

Tags: Computer science, CUDA, FFT, Machine learning, nVidia, nVidia Jetson TX1, Performance, performance portability

October 29, 2023 by hgpu

GEVO-ML: Optimizing Machine Learning Code with Evolutionary Computation

Jhe-Yu Liou, Stephanie Forrest, Carole-Jean Wu

View

Download (PDF)

Tags: Computer science, CUDA, Deep learning, Evolutionary Computations, Machine learning, Neural and Evolutionary Computing, nVidia, nVidia P100

October 29, 2023 by hgpu

Predicting the Execution Time of a kernel on a specific GPU using PTX code

Monika Dagar, Jorge Roldan

View

Download (PDF)

Tags: Computer science, CUDA, Deep learning, Machine learning, nVidia, nVidia GeForce GTX 1650, nVidia GeForce GTX Titan XP, Performance, PTX, Tesla K20, Tesla K80, Tesla M60, Tesla P100, Tesla T4, Tesla V100

October 22, 2023 by hgpu

OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials

Peter Eastman, Raimondas Galvelis, Raúl P. Peláez, Charlles R. A. Abreu, Stephen E. Farr, Emilio Gallicchio, Anton Gorenko, Michael M. Henry, Frank Hu, Jing Huang, Andreas Krämer, Julien Michel, Joshua A. Mitchell, Vijay S. Pande, João PGLM Rodrigues, Jaime Rodriguez-Guerra, Andrew C. Simmonett, Jason Swails, Ivy Zhang, John D. Chodera, Gianni De Fabritiis, Thomas E. Markland

View

Download (PDF)

Source codes

Tags: AMD Radeon Pro V620, ATI, Chemical Physics, CUDA, HIP, Machine learning, Molecular dynamics, Molecular simulation, nVidia, nVidia A100, nVidia GeForce RTX 4080, OpenCL, Package, Physics

October 15, 2023 by hgpu

Memory Efficient Mixed-Precision Optimizers

Basile Lewandowski, Atli Kosson

View

Download (PDF)

Tags: Computer science, CUDA, Machine learning, Mixed precision, nVidia, nVidia A100, nVidia V100

October 1, 2023 by hgpu

* * *

high performance computing on graphics processing units: hgpu.org

QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

Evaluating the Wide Area Classroom After 24,000 HPC Students

A Heterogeneous Inference Framework for a Deep Neural Network

Orion: Interference-aware, Fine-grained GPU Sharing for ML Applications

A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures

Creating a Dataset for High-Performance Computing Code Translation using LLMs: A Bridge Between OpenMP Fortran and C+

Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU

Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies

GEVO-ML: Optimizing Machine Learning Code with Evolutionary Computation

Predicting the Execution Time of a kernel on a specific GPU using PTX code

OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials

Memory Efficient Mixed-Precision Optimizers

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)