An Analysis of Programmer Productivity versus Performance for High Level Data Parallel Programming
Embedded Systems Lab, University Of Leicester
Communicating Process Architectures, 2011
@article{cole2011analysis,
title={An Analysis of Programmer Productivity versus Performance for High Level Data Parallel Programming},
author={COLE, A. and McEWAN, A. and SINGH, S.},
year={2011}
}
Data parallel programming provides an accessible model for exploiting the power of parallel computing elements without resorting to the explicit use of low level programming techniques based on locks, threads and monitors. The emergence of Graphics Processing Units (GPUs) with hundreds or thousands of processing cores has made data parallel computing available to a wider class of programmers. GPUs can be used not only for accelerating the processing of computer graphics but also for general purpose data-parallel programming. Low level data-parallel programming languages based on the Compute Unified Device Architecture (CUDA) provide an approach for developing programs for GPUs but these languages require explicit creation and coordination of threads and careful data layout and movement. This has created a demand for higher level programming languages and libraries which raise the abstraction level of data-parallel programming and increase programmer productivity. The Accelerator system was developed by Microsoft for writing data parallel code in a high level manner which can execute on GPUs, multicore processors using SSE3 vector instructions and FPGA chips. This paper compares the performance and development effort of the high level Accelerator system against lower level systems which are more difficult to use but may yield better results. Specifically, we compare against the NVIDIA CUDA compiler and sequential C++ code considering both the level of abstraction in the implementation code and the execution models. We compare the performance of these systems using several case studies. For some classes of problems, Accelerator has a performance comparable to CUDA, but for others its performance is significantly reduced; however in all cases it provides a model which is easier to use and enables greater programmer productivity.
October 14, 2011 by hgpu