Merge: a programming model for heterogeneous multi-core systems

Michael D. Linderman, Jamison D. Collins, Hong Wang, Teresa H. Meng
Dept. of Electrical Engineering, Stanford University, Stanford, CA, USA
In ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems (2008), pp. 287-296


   title={Merge: a programming model for heterogeneous multi-core systems},

   author={Linderman, M.D. and Collins, J.D. and Wang, H. and Meng, T.H.},

   journal={ACM SIGOPS Operating Systems Review},








Download Download (PDF)   View View   Source Source   



In this paper we propose the Merge framework, a general purpose programming model for heterogeneous multi-core systems. The Merge framework replaces current ad hoc approaches to parallel programming on heterogeneous platforms with a rigorous, library-based methodology that can automatically distribute computation across heterogeneous cores to achieve increased energy and performance efficiency. The Merge framework provides (1) a predicate dispatch-based library system for managing and invoking function variants for multiple architectures; (2) a high-level, library-oriented parallel language based on map-reduce; and (3) a compiler and runtime which implement the map-reduce language pattern by dynamically selecting the best available function implementations for a given input and machine configuration. Using a generic sequencer architecture interface for heterogeneous accelerators, the Merge framework can integrate function variants for specialized accelerators, offering the potential for to-the-metal performance for a wide range of heterogeneous architectures, all transparent to the user. The Merge framework has been prototyped on a heterogeneous platform consisting of an Intel Core 2 Duo CPU and an 8-core 32-thread Intel Graphics and Media Accelerator X3000, and a homogeneous 32-way Unisys SMP system with Intel Xeon processors. We implemented a set of benchmarks using the Merge framework and enhanced the library with X3000 specific implementations, achieving speedups of 3.6x — 8.5x using the X3000 and 5.2x — 22x using the 32-way system relative to the straight C reference implementation on a single IA32 core.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: