high performance computing on graphics processing units: hgpu.org

hgpu.org » Network communication

Anonymized Network Sensing using C++26 std::execution on GPUs

Michael Mandulak, Sayan Ghosh, S M Ferdous, Mahantesh Halappanavar, George Slota

View

Tags: Computer science, CUDA, Network communication, nVidia, nVidia A100

October 19, 2025 by hgpu

Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms

Zhiyi Hu, Siyuan Shen, Tommaso Bonato, Sylvain Jeaugey, Cedell Alexander, Eric Spada, Jeff Hammond, Torsten Hoefler

View

Tags: Computer science, CUDA, GPU cluster, Network communication, nVidia, nVidia GH200

July 13, 2025 by hgpu

CUDAnalyst (CUDA + Analyst)

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation

CodegenBench

CodegenBench: Can LLMs Write Efficient Code Across Architectures?

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

Device Virtual Machine (DVM)

DVM: Real-Time Kernel Generation for Dynamic AI Models

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Agentic Code Optimization via Compiler-LLM Cooperation

Agentic Code Optimization via Compiler-LLM Cooperation

See all packages

* * *

* * *

HGPU group © 2010-2026 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us: