Tiled QR Decomposition and Its Optimization on CPU and GPU Computing System
Computer Engineering Research Lab., Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
P2S2, 2013
@article{kim2013tiled,
title={Tiled QR Decomposition and Its Optimization on CPU and GPU Computing System},
author={Kim, Dongjin and Park, Kyu-Ho},
year={2013}
}
There can be many types of heterogeneous computing systems, and the most useful one is the CPU and GPU computing system. In this system, we try to run QR decomposition, which expresses a standard real matrix as a production of two matrices. For a tiled QR decomposition algorithm, which is a parallelized version of QR decomposition, because of the heterogeneity of computing devices and communication cost, the way that each tile is distributed into which device is the main issue of tiled QR decomposition. The goal of this study is to optimize the tile distribution and the tiled QR decomposition operation mathematically, depending on the given system. We select the main computing device for the main steps of the algorithm, optimize the number of devices, and optimize the tile distribution among the devices using a distribution guide array. Our evaluation confirms that our method has good scalability and the optimization process maximizes the tiled QR decomposition performance.
November 8, 2013 by hgpu