hgpu.org » nVidia L20
Borui Wan, Gaohong Liu, Zuquan Song, Jun Wang, Yun Zhang, Guangming Sheng, Shuguang Wang, Houmin Wei, Chenyuan Wang, Weiqiang Lou, Xi Yang, Mofan Zhang, Kaihua Jiang, Cheng Ren, Xiaoyun Zhi, Menghan Yu, Zhe Nan, Zhuolin Zheng, Baoquan Zhong, Qinlong Wang, Huan Yu, Jinxin Chi, Wang Zhang, Yuhan Li, Zixian Du, Sida Zhao, Yongqiang Zhang, Jingzhe Tang, Zherui Liu, Chuan Wu, Yanghua Peng, Haibin Lin, Wencong Xiao, Xin Liu, Liang Xiang
Tags: AI, Computer science, CUDA, LLM, nVidia, nVidia L20
September 28, 2025 by hgpu
Recent source codes
* * *
Most viewed papers (last 30 days)
- CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning
- Accurate Models of NVIDIA Tensor Cores
- Microbenchmarking NVIDIA's Blackwell Architecture: An in-depth Architectural Analysis
- TritonForge: Profiling-Guided Framework for Automated Triton Kernel Optimization
- PEAK: A Performance Engineering AI-Assistant for GPU Kernels Powered by Natural Language Transformations
- cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution
- Decoupled Triton: A Block-Level Decoupled Language for Writing and Exploring Efficient Machine-Learning Kernels
- Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation
- Tilus: A Tile-Level GPGPU Programming Language for Low-Precision Computation
- Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters
* * *



