PARRAY: A Unifying Array Representation for Heterogeneous Parallelism

hgpu.org » Applications » Computer science » PARRAY: A Unifying Array Representation for Heterogeneous Parallelism

PARRAY: A Unifying Array Representation for Heterogeneous Parallelism

Yifeng Chen, Xiang Cui, Hong Mei

HCST Key Lab, School of EECS, Peking University, Beijing 100871, P.R.China

17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12), 2012

BibTeX

Download (PDF)

View

Source

2141

views

This paper introduces a programming interface called PARRAY (or Parallelizing ARRAYs) that supports system-level succinct programming for heterogeneous parallel systems like GPU clusters. The current practice of software development requires combining several low-level libraries like Pthread, OpenMP, CUDA and MPI. Achieving productivity and portability is hard with different numbers and models of GPUs. PARRAY extends mainstream C programming with novel array types of the following features: 1) the dimensions of an array type are nested in a tree structure, conceptually reflecting the memory hierarchy; 2) the definition of an array type may contain references to other array types, allowing sophisticated array types to be created for parallelization; 3) threads also form arrays that allow programming in a Single-Program-Multiple-Codeblock (SPMC) style to unify various sophisticated communication patterns. This leads to shorter, more portable and maintainable parallel codes, while the programmer still has control over performance-related features necessary for deep manual optimization. Although the source-to-source code generator only faithfully generates low-level library calls according to the type information, higher-level programming and automatic performance optimization are still possible through building libraries of subprograms on top of PARRAY. The case study on cluster FFT illustrates a simple 30-line code that 2x-outperforms Intel Cluster MKL on the Tianhe-1A system with 7168 Fermi GPUs and 14336 CPUs.

Tags: Code generation, Computer science, CUDA, FFT, GPU cluster, Heterogeneous systems, MPI, nVidia, Optimization, Performance, Programming techniques, Pthreads, Tesla C1060

January 5, 2012 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org