high performance computing on graphics processing units: hgpu.org

hgpu.org » PyTorch

LIFT: LLM-Based Pragma Insertion for HLS via GNN Supervised Fine-Tuning

Neha Prakriya, Zijian Ding, Yizhou Sun, Jason Cong

View

Tags: AMD Radeon Instinct MI250, ATI, Computer science, FPGA, HLS, LLM, Machine learning, Neural networks, PyTorch

May 4, 2025 by hgpu

Data-efficient LLM Fine-tuning for Code Generation

Weijie Lv, Xuan Xia, Sheng-Jun Huang

View

Tags: Code generation, Computer science, CUDA, LLM, nVidia, nVidia A100, Package, Python, PyTorch

April 27, 2025 by hgpu

Scalability Evaluation of HPC Multi-GPU Training for ECG-based LLMs

Dimitar Mileski, Nikola Petrovski, Marjan Gusev

View

Tags: Computer science, CUDA, Deep learning, HPC, LLM, nVidia, nVidia A100, PTX, PyTorch

April 13, 2025 by hgpu

PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch

Abhishek Ghosh, Ajay Nayak, Ashish Panwar, Arkaprava Basu

View

Tags: Benchmarking, Computer science, CUDA, Machine learning, nVidia, nVidia RTX A6000, PyTorch

March 30, 2025 by hgpu

KernelBench: Can LLMs Write Efficient GPU Kernels?

Anne Ouyang, Simon Guo, Simran Arora, Alex L. Zhang, William Hu, Christopher Ré, Azalia Mirhoseini

View

Tags: AI, Benchmarking, Computer science, CUDA, LLM, Machine learning, nVidia, nVidia L40s, Package, PyTorch

February 24, 2025 by hgpu

Profiling Apple Silicon Performance for ML Training

Dahua Feng, Zhiming Xu, Rongxiang Wang, Felix Xiaozhu Lin

View

Tags: AI, Apple M2 Max, Apple M2 Pro, Apple M2 Ultra, Computer science, CUDA, Linear Algebra, LLM, Machine learning, nVidia, nVidia GeForce RTX 4090, nVidia GeFroce RTX 2080 Ti, nVidia Quadro RTX 4000, nVidia RTX A6000, Performance, PyTorch

February 3, 2025 by hgpu

CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection

Ruijun Feng, Hammond Pearce, Pietro Liguori, Yulei Sui

View

Tags: Computer science, CUDA, LLM, nVidia, nVidia H100, Python, PyTorch, Security

January 13, 2025 by hgpu

LeetDecoding: A PyTorch Library for Exponentially Decaying Causal Linear Attention with CUDA Implementations

Jiaping Wang, Simiao Zhang, Qiao-Chu He, Yifan Chen

View

Tags: Benchmarking, Computer science, CUDA, LLM, Machine learning, nVidia, nVidia A100, nVidia RTX A6000, Package, Python, PyTorch

January 13, 2025 by hgpu

TorchQC – A framework for efficiently integrating machine and deep learning methods in quantum dynamics and control

Dimitris Koutromanos, Dionisis Stefanatos, Emmanuel Paspalakis

View

Tags: Deep learning, Machine learning, Package, Physics, Python, PyTorch, Quantum Physics

December 29, 2024 by hgpu

Accelerating Sparse Graph Neural Networks with Tensor Core Optimization

Ka Wai Wu

View

Tags: Computer science, CUDA, Graph, Heterogeneous systems, Neural networks, nVidia, PyTorch, Tesla V100

December 24, 2024 by hgpu

Deep Learning Model Security: Threats and Defenses

Tianyang Wang, Ziqian Bi, Yichao Zhang, Ming Liu, Weiche Hsieh, Pohsun Feng, Lawrence K.Q. Yan, Yizhu Wen, Benji Peng, Junyu Liu, Keyu Chen, Sen Zhang, Ming Li, Chuanqi Jiang, Xinyuan Song, Junjie Yang, Bowen Jing, Jintao Ren, Junhao Song, Hong-Ming Tseng, Silin Chen, Yunze Wang, Chia Xin Liang, Jiawei Xu, Xuanhe Pan, Jinlang Wang, Qian Niu

View

Tags: Computer science, Deep learning, nVidia, PyTorch, Review, Security

December 15, 2024 by hgpu

GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch

Sungho Lee, Marco Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich, Giorgio Fabbro, Kyogu Lee, Yuki Mitsufuji

View

Tags: nVidia, nVidia GeForce RTX 3090, Package, PyTorch, Signal processing

August 18, 2024 by hgpu

Luthier: Bridging Auto-Tuning and Vendor Libraries for Efficient Deep Learning Inference

Luthier: Bridging Auto-Tuning and Vendor Libraries for Efficient Deep Learning Inference

Fused Kernel Library (FKL)

The Fused Kernel Library: A C++ API to Develop Highly-Efficient GPU Libraries

GPUHammer: Rowhammer Attacks on GPU Memories are Practical

GPUHammer: Rowhammer Attacks on GPU Memories are Practical

Block: Balance Loader of LLM Serving with Context, Knowledge and Predictive Scheduling

Block: Balancing Load in LLM Serving with Context, Knowledge and Predictive Scheduling

SIGMo: Scalable Isomorphism Graph Matching on GPUs

SIGMo: High-Throughput Batched Subgraph Isomorphism on GPUs for Molecular Matching

DGEMM without FP64 Arithmetic - using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme

DGEMM without FP64 Arithmetic – using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme

GEAK-agent: LLM-based AI agent, which can write correct and efficient GPU kernels automatically

Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

OpenDwarfs 2025: re-engineered version of the OpenDwarfs benchmark suite, for compatibility with modern platforms

OpenDwarfs 2025: Modernizing the OpenDwarfs Benchmark Suite for Heterogeneous Computing

Specx: Speculative task-based runtime system

Specx: a C++ task-based runtime system for heterogeneous distributed architectures

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

See all packages

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us:

contact@hpgu.org