hgpu.org » Latency
Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level Analysis
Hamdy Abdelkhalik, Yehia Arafa, Nandakishore Santhi, Abdel-Hameed Badawy
Tags: Benchmarking, Computer science, Hardware Architecture, Latency, nVidia, nVidia A100, Performance, PTX
August 28, 2022 by hgpu
Igor Baratta, Chris Richardson, Garth Wells
Tags: Benchmarking, Computer science, Latency, nVidia, nVidia A100, OpenCL, Package, Performance, SYCL
August 21, 2022 by hgpu
Recent source codes
* * *
Most viewed papers (last 30 days)
- Dissecting the NVIDIA Blackwell Architecture with Microbenchmarks
- Performance Portable Gradient Computations Using Source Transformation
- ConTraPh: Contrastive Learning for Parallelization and Performance Optimization
- Specx: a C++ task-based runtime system for heterogeneous distributed architectures
- Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks
- Understanding the Landscape of Ampere GPU Memory Errors
- Using Deep Reinforcement Learning for Automatic Code Optimization in the MLIR Compiler
- GBOTuner: Autotuning of OpenMP Parallel Codes with Bayesian Optimization and Code Representation Transfer Learning
- SIGMo: High-Throughput Batched Subgraph Isomorphism on GPUs for Molecular Matching
- Kevin: Multi-Turn RL for Generating CUDA Kernels
* * *