Towards Automating Multi-dimensional Data Decomposition for Executing a Single-GPU Code on a Multi-GPU System
Graduate School of Information Science and Technology, Osaka University, 1-5 Yamadaoka, Suita, Osaka 565-0871, Japan
4th International Workshop on Computer Systems and Architectures (CSA’16), 2016
In this paper, we present a data decomposition method for multi-dimensional data, aiming at realizing multi graphics processing unit (GPU) acceleration of a compute unified device architecture (CUDA) code written for a single GPU. Our multi-dimensional method extends a previous method that deals with one-dimensional (1-D) data. The method performs a sample run of selected GPU threads to decompose large data into small segments, which avoid exhaustion of GPU memory. As compared with the previous method, our multidimensional method produces smaller segments, so that it saves GPU memory consumption and reduces the amount of CPU-GPU data transfer. As a result of experiments using matrix multiplication, the presented method consumed less GPU memory compared with that of the previous method, and thereby successfully processed 29 times larger matrices as long as the matrices fit into CPU memory. However, we found that index transformation needed for multi-dimensional decomposition dropped the effective performance by 28%.
November 1, 2016 by hgpu