29339

Evaluating Operators in Deep Neural Networks for Improving Performance Portability of SYCL

Zheming Jin
Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
Oak Ridge National Laboratory, 2024

@techreport{jin2024evaluating,

   title={Evaluating Operators in Deep Neural Networks for Improving Performance Portability of SYCL},

   author={Jin, Zheming},

   year={2024},

   institution={Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)}

}

Download Download (PDF)   View View   Source Source   Source codes Source codes

714

views

SYCL is a portable programming model for heterogeneous computing, so it is important to obtain reasonable performance portability of SYCL. Towards the goal of better understanding and improving performance portability of SYCL for machine learning workloads, we have been developing benchmarks for basic operators in deep neural networks (DNNs). These operators could be offloaded to heterogeneous computing devices such as graphics processing units (GPUs) to speed up computation. In this work, we introduce the benchmarks, evaluate the performance of the operators on GPU-based systems, and describe the causes of the performance gap between the SYCL and Compute Unified Device Architecture (CUDA) kernels. We find that the causes are related to the utilization of the texture cache for read-only data, optimization of the memory accesses with strength reduction, shared local memory accesses, and register usage per thread. We hope that the efforts of developing benchmarks for studying performance portability will stimulate discussion and interactions within the community.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: