Modeling Deep Learning Accelerator Enabled GPUs

Md Aamir Raihan, Negar Goli, Tor Aamodt
Electrical and Computer Engineering, University of British Columbia
arXiv:1811.08309 [cs.MS], (19 Nov 2018)


   title={Modeling Deep Learning Accelerator Enabled GPUs},

   author={Raihan, Md Aamir and Goli, Negar and Aamodt, Tor},






Download Download (PDF)   View View   Source Source   



The efficacy of deep learning has resulted in it becoming one of the most important applications run in data centers today. The NVIDIA Tesla V100 GPU introduced a specialized functional unit called the Tensor Core to meet growing demand for higher performance on this workload. To exploit the full capability of current NVIDIA GPUs machine learning researchers have started to use Tensor Cores. For example, 5 out of 6, 2018 Gordon Bell Award Finalists used Tensor Cores in their work. However, currently no open-source GPU microarchitectural simulators model Tensor Cores. In this paper, we comprehensively investigate NVIDIA’s Tensor Core implementation found in Volta and Turing architectures and propose an architectural model for it. Our Tensor Core timing model, implemented in GPGPU-Sim, achieves 99.6% IPC correlation versus a physical V100 GPU. Building upon this we also enable GPGPU-Sim to run NVIDIA’s CUTLASS, an open-source CUDA C++ templates library providing customizable GEMM templates including the support for Tensor Cores.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: