high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System

FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System

Size Zheng, Yun Liang, Shuo Wang, Renze Chen, Kaiwen Sheng

CECA, Department of CSPeking University

Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’20), 859–873, 2020

DOI:10.1145/3373376.3378508

@inproceedings{10.1145/3373376.3378508,

author={Zheng, Size and Liang, Yun and Wang, Shuo and Chen, Renze and Sheng, Kaiwen},

title={FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System},

year={2020},

isbn={9781450371025},

publisher={Association for Computing Machinery},

address={New York, NY, USA},

url={https://doi.org/10.1145/3373376.3378508},

doi={10.1145/3373376.3378508},

booktitle={Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems},

pages={859–873},

numpages={15},

keywords={heterogeneous systems, compiler optimization, code generation, machine learning},

location={Lausanne, Switzerland},

series={ASPLOS ’20}

}

Download (PDF)

View

Source

Source codes

Package:

FlexTensor: Automatic Schedule Exploration and Optimization Framework for Tensor Computations

2952

views

Tensor computation plays a paramount role in a broad range of domains, including machine learning, data analytics, and scientific computing. The wide adoption of tensor computation and its huge computation cost has led to high demand for flexible, portable, and high-performance library implementation on heterogeneous hardware accelerators such as GPUs and FPGAs. However, the current tensor library implementation mainly requires programmers to manually design low-level implementation and optimize from the algorithm, architecture, and compilation perspectives. Such a manual development process often takes months or even years, which falls far behind the rapid evolution of the application algorithms. In this paper, we introduce FlexTensor, which is a schedule exploration and optimization framework for tensor computation on heterogeneous systems. FlexTensor can optimize tensor computation programs without human interference, allowing programmers to only work on high-level programming abstraction without considering the hardware platform details. FlexTensor systematically explores the optimization design spaces that are composed of many different schedules for different hardware. Then, FlexTensor combines different exploration techniques, including heuristic method and machine learning method to find the optimized schedule configuration. Finally, based on the results of exploration, customized schedules are automatically generated for different hardware. In the experiments, we test 12 different kinds of tensor computations with totally hundreds of test cases and FlexTensor achieves average 1.83x performance speedup on NVIDIA V100 GPU compared to cuDNN; 1.72x performance speedup on Intel Xeon CPU compared to MKL-DNN for 2D convolution; 1.5x performance speedup on Xilinx VU9P FPGA compared to OpenCL baselines; 2.21x speedup on NVIDIA V100 GPU compared to the state-of-the-art.

Tags: Algorithms, Computer science, CUDA, FPGA, Heterogeneous systems, Machine learning, nVidia, OpenCL, Package, Tesla V100

April 19, 2020 by hgpu

Rating: 5.0/5. From 1 vote.

Please wait...