Evaluating Operators in Deep Neural Networks for Improving Performance Portability of SYCL
Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
Oak Ridge National Laboratory, 2024
DOI:10.2172/2404613
@techreport{jin2024evaluating,
title={Evaluating Operators in Deep Neural Networks for Improving Performance Portability of SYCL},
author={Jin, Zheming},
year={2024},
institution={Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)}
}
SYCL is a portable programming model for heterogeneous computing, so it is important to obtain reasonable performance portability of SYCL. Towards the goal of better understanding and improving performance portability of SYCL for machine learning workloads, we have been developing benchmarks for basic operators in deep neural networks (DNNs). These operators could be offloaded to heterogeneous computing devices such as graphics processing units (GPUs) to speed up computation. In this work, we introduce the benchmarks, evaluate the performance of the operators on GPU-based systems, and describe the causes of the performance gap between the SYCL and Compute Unified Device Architecture (CUDA) kernels. We find that the causes are related to the utilization of the texture cache for read-only data, optimization of the memory accesses with strength reduction, shared local memory accesses, and register usage per thread. We hope that the efforts of developing benchmarks for studying performance portability will stimulate discussion and interactions within the community.
August 14, 2024 by hgpu