Algorithm level power efficiency optimization for CPU-GPU processing element in data intensive SIMD/SPMD computing
Department of Computer Science, University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo, JAPAN 113-0033, JST, CREST, Japan
Journal of Parallel and Distributed Computing (20 October 2010)
@article{ren2010algorithm,
title={Algorithm level power efficiency optimization for CPU-GPU processing element in data intensive SIMD/SPMD computing},
author={Ren, D.Q.},
journal={Journal of Parallel and Distributed Computing},
issn={0743-7315},
year={2010},
publisher={Elsevier}
}
Power efficiency investigation has been required in each level of a High Performance Computing (HPC) system because of the increasing computation demands of scientific and engineering applications. Focusing on handling the critical design constraints in software level that run beyond a parallel system composed of huge numbers of power-hungry components, we optimize HPC program design in order to achieve the best possible power performance on the target hardware platform. The power performance of A CUDA Processing Element (PE) is determined by both hardware factors including power features of each component including with CPU, GPU, main memory and PCI buses, and their interconnection architecture; and software factors including algorithm design and the character of executable instructions performed on it. In this paper, approaches to model and evaluate the power consumption of large scale SIMD computation by CUDA PEs on multi-core and GPU platform are introduced. The model allows obtaining design characteristic values at the early programming stage, thus benefit programmers by providing necessary environment information for choosing the best power-efficient alternative. Based on the model, CPU Dynamic frequency scaling (DFS) can be applied on CUDA PE architecture that adjusts CPU frequency to enhance power efficiency of the entire PE without compromising its computing performance. The power model and power efficiency improvements of the new designs have been validated by measuring the new programs on the real GPU multiprocessing system.
November 22, 2010 by hgpu