hgpu.org » nVidia L40
Kaixuan Zhang, Yunfan Cui, Shuhao Zhang, Chutong Ding, Shiyou Qian, Luping Wang, Jian Cao, Guangtao Xue, Cheng Huang, Guodong Yang, Liping Zhang
Tags: Computer science, CUDA, Heterogeneous systems, Machine learning, nVidia, nVidia A100, nVidia A40, nVidia H100, nVidia H20, nVidia H200, nVidia H800, nVidia L20, nVidia L40, nVidia RTX 6000 Ada, Performance, Triton
January 25, 2026 by hgpu
Recent source codes
RepoLaunch: Automating Build and Test Pipeline of Code Repositories on ANY Language and ANY Platform
RepoLaunch: Automating Build and Test Pipeline of Code Repositories on ANY Language and ANY Platform
* * *
Most viewed papers (last 30 days)
- DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels
- Deep Kernel Fusion for Transformers
- Improving HPC Code Generation Capability of LLMs via Online Reinforcement Learning with Real-Machine Benchmark Rewards
- StitchCUDA: An Automated Multi-Agents End-to-End GPU Programing Framework with Rubric-based Agentic Reinforcement Learning
- Execution-Centric Characterization of FP8 Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD MI300A
- Catalyst-Agent: Autonomous heterogeneous catalyst screening and optimization with an LLM Agent
- A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5
- CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
- Joint Training on AMD and NVIDIA GPUs
- Fine-Tuning GPT-5 for GPU Kernel Generation
* * *



