Performance Evaluation and Tuning of An OpenCL based Matrix Multiplier

Yiyu Tan, Toshiyuki Imamura
RIKEN Center for Computational Science, 7-1-26 Minatojima-minami-machi, Chuo-ku, Kobe, Hyogo, Japan
International Conference on Parallel and Distributed Processing Techniques & Applications (PDPTA’18), 2018


   title={Performance Evaluation and Tuning of An OpenCL based Matrix Multiplier},

   author={Tan, Yiyu and Imamura, Toshiyuki},



Download Download (PDF)   View View   Source Source   



Matrix multiplication is one of the fundamental building blocks of numerical linear algebra. It requires computer systems have huge computing capability and consumes much more power as problem size is increased. In this research, an OpenCL-based matrix multiplier is presented. When data are single precision floating-points, compared with the software simulations based on the Intel MKL and OpenBLAS libraries carried out on a PC with 32 GB DDR RAM and an Intel i7-6800K processor running at 3.4 GHz, although the proposed matrix multiplier implemented by using the FPGA board DE5-NET achieves much worse performance in computation throughput, it gains 1.17 to 8.47 times, and 1.54 to 11.27 times in energy efficiency, respectively, even if the fabrication technology of the FPGA is 28 nm while it is 14 nm in the Intel i7-6800K processor. Furthermore, the performance tuning results show that the matrix multiplier obtains the best performance when the block size is 64×64 and the kernel vectorization is 4.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: