hgpu.org » Latency
Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level Analysis
Hamdy Abdelkhalik, Yehia Arafa, Nandakishore Santhi, Abdel-Hameed Badawy
Tags: Benchmarking, Computer science, Hardware Architecture, Latency, nVidia, nVidia A100, Performance, PTX
August 28, 2022 by hgpu
Igor Baratta, Chris Richardson, Garth Wells
Tags: Benchmarking, Computer science, Latency, nVidia, nVidia A100, OpenCL, Package, Performance, SYCL
August 21, 2022 by hgpu
Recent source codes
* * *
Most viewed papers (last 30 days)
- PEAK: A Performance Engineering AI-Assistant for GPU Kernels Powered by Natural Language Transformations
- Hardware Acceleration for Neural Networks: A Comprehensive Survey
- Tilus: A Tile-Level GPGPU Programming Language for Low-Precision Computation
- AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization
- The New Compiler Stack: A Survey on the Synergy of LLMs and Compilers
- SeedFold: Scaling Biomolecular Structure Prediction
- Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs
- KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta
- GPU Kernel Optimization Beyond Full Builds: An LLM Framework with Minimal Executable Programs
- Optimal Software Pipelining and Warp Specialization for Tensor Core GPUs
* * *




