Scalable framework for mapping streaming applications onto multi-GPU systems

hgpu.org » Programming » Algorithms » Scalable framework for mapping streaming applications onto multi-GPU systems

Scalable framework for mapping streaming applications onto multi-GPU systems

Huynh Phung Huynh, Andrei Hagiescu, Weng-Fai Wong, Rick Siow Mong Goh

A*STAR Institute of High Performance Computing, Singapore, Singapore

17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP ’12), 2012

DOI:10.1145/2145816.2145818

BibTeX

Download (PDF)

View

Source

2111

views

Graphics processing units leverage on a large array of parallel processing cores to boost the performance of a specific streaming computation pattern frequently found in graphics applications. Unfortunately, while many other general purpose applications do exhibit the required streaming behavior, they also possess unfavorable data layout and poor computation-to-communication ratios that penalize any straight-forward execution on the GPU. In this paper we describe an efficient and scalable code generation framework that can map general purpose streaming applications onto a multi-GPU system. This framework spans the entire core and memory hierarchy exposed by the multi-GPU system. Several key features in our framework ensure the scalability required by complex streaming applications. First, we propose an efficient stream graph partitioning algorithm that partitions the complex application to achieve the best performance under a given shared memory constraint. Next, the resulting partitions are mapped to multiple GPUs using an efficient architecture-driven strategy. The mapping balances the workload while considering the communication overhead. Finally, a highly effective pipeline execution is employed for the execution of the partitions on the multi-GPU system. The framework has been implemented as a back-end of the StreamIt programming language compiler. Our comprehensive experiments show its scalability and significant performance speedup compared with a previous state-of-the-art solution.

Tags: Algorithms, Code generation, Computer science, CUDA, GPU cluster, nVidia, Tesla C2070

March 8, 2012 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org