hgpu.org » nVidia L40s
Anne Ouyang, Simon Guo, Simran Arora, Alex L. Zhang, William Hu, Christopher Ré, Azalia Mirhoseini
Tags: AI, Benchmarking, Computer science, CUDA, LLM, Machine learning, nVidia, nVidia L40s, Package, PyTorch
February 24, 2025 by hgpu
Bertil Schmidt, Felix Kallenborn, Alexander Wichmann, Alejandro Chacon, Christian Hundt
Tags: Bioinformatics, Biology, Computer science, CUDA, FPGA, Genomics, nVidia, nVidia A100, nVidia H100, nVidia L4, nVidia L40s, nVidia V100, Package
November 24, 2024 by hgpu
Recent source codes
* * *
Most viewed papers (last 30 days)
- The Anatomy of a Triton Attention Kernel
- CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization
- KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit
- An MLIR pipeline for offloading Fortran to FPGAs via OpenMP
- RDMA Point-to-Point Communication for LLM Systems
- ProofWright: Towards Agentic Formal Verification of CUDA
- QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation
- Inside VOLT: Designing an Open-Source GPU Compiler
- Iris: First-Class Multi-GPU Programming Experience in Triton
- AIvailable: A Software-Defined Architecture for LLM-as-a-Service on Heterogeneous and Legacy GPUs
* * *




