high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Merge: a programming model for heterogeneous multi-core systems

Merge: a programming model for heterogeneous multi-core systems

Michael D. Linderman, Jamison D. Collins, Hong Wang, Teresa H. Meng

Dept. of Electrical Engineering, Stanford University, Stanford, CA, USA

In ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems (2008), pp. 287-296

DOI:10.1145/1346281.1346318

BibTeX

Download (PDF)

View

Source

2058

views

In this paper we propose the Merge framework, a general purpose programming model for heterogeneous multi-core systems. The Merge framework replaces current ad hoc approaches to parallel programming on heterogeneous platforms with a rigorous, library-based methodology that can automatically distribute computation across heterogeneous cores to achieve increased energy and performance efficiency. The Merge framework provides (1) a predicate dispatch-based library system for managing and invoking function variants for multiple architectures; (2) a high-level, library-oriented parallel language based on map-reduce; and (3) a compiler and runtime which implement the map-reduce language pattern by dynamically selecting the best available function implementations for a given input and machine configuration. Using a generic sequencer architecture interface for heterogeneous accelerators, the Merge framework can integrate function variants for specialized accelerators, offering the potential for to-the-metal performance for a wide range of heterogeneous architectures, all transparent to the user. The Merge framework has been prototyped on a heterogeneous platform consisting of an Intel Core 2 Duo CPU and an 8-core 32-thread Intel Graphics and Media Accelerator X3000, and a homogeneous 32-way Unisys SMP system with Intel Xeon processors. We implemented a set of benchmarks using the Merge framework and enhanced the library with X3000 specific implementations, achieving speedups of 3.6x — 8.5x using the X3000 and 5.2x — 22x using the 32-way system relative to the straight C reference implementation on a single IA32 core.

Tags: Architecture, Computer science, Intel, Intel GMA X3000, Programming techniques

December 10, 2010 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Merge: a programming model for heterogeneous multi-core systems

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Merge: a programming model for heterogeneous multi-core systems

Share this:

Recent source codes

Most viewed papers (last 30 days)