PARRAY: A Unifying Array Representation for Heterogeneous Parallelism
HCST Key Lab, School of EECS, Peking University, Beijing 100871, P.R.China
17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12), 2012
@article{chen2012parray,
title={PARRAY: A Unifying Array Representation for Heterogeneous Parallelism},
author={Chen, Yifeng and Cui, Xiang and Mei, Hong},
year={2012}
}
This paper introduces a programming interface called PARRAY (or Parallelizing ARRAYs) that supports system-level succinct programming for heterogeneous parallel systems like GPU clusters. The current practice of software development requires combining several low-level libraries like Pthread, OpenMP, CUDA and MPI. Achieving productivity and portability is hard with different numbers and models of GPUs. PARRAY extends mainstream C programming with novel array types of the following features: 1) the dimensions of an array type are nested in a tree structure, conceptually reflecting the memory hierarchy; 2) the definition of an array type may contain references to other array types, allowing sophisticated array types to be created for parallelization; 3) threads also form arrays that allow programming in a Single-Program-Multiple-Codeblock (SPMC) style to unify various sophisticated communication patterns. This leads to shorter, more portable and maintainable parallel codes, while the programmer still has control over performance-related features necessary for deep manual optimization. Although the source-to-source code generator only faithfully generates low-level library calls according to the type information, higher-level programming and automatic performance optimization are still possible through building libraries of subprograms on top of PARRAY. The case study on cluster FFT illustrates a simple 30-line code that 2x-outperforms Intel Cluster MKL on the Tianhe-1A system with 7168 Fermi GPUs and 14336 CPUs.
January 5, 2012 by hgpu