EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system
Microarchitecture Research Lab, Microprocessor Technology Labs, Intel Corporation
SIGPLAN Not. In PLDI ’07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, Vol. 42 (June 2007), pp. 156-166.
@conference{wang2007exochi,
title={EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system},
author={Wang, P.H. and Collins, J.D. and Chinya, G.N. and Jiang, H. and Tian, X. and Girkar, M. and Yang, N.Y. and Lueh, G.Y. and Wang, H.},
booktitle={Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation},
pages={156–166},
year={2007},
organization={ACM}
}
Future mainstream microprocessors will likely integrate specialized accelerators, such as GPUs, onto a single die to achieve better performance and power efficiency. However, it remains a keen challenge to program such a heterogeneous multicore platform, since these specialized accelerators feature ISAs and functionality that are significantly different from the general purpose CPU cores. In this paper, we present EXOCHI: (1) Exoskeleton Sequencer(EXO), an architecture to represent heterogeneous acceleratorsas ISA-based MIMD architecture resources, and a shared virtual memory heterogeneous multithreaded program execution model that tightly couples specialized accelerator cores with generalpurpose CPU cores, and (2) C for Heterogeneous Integration(CHI), an integrated C/C++ programming environment that supports accelerator-specific inline assembly and domain-specific languages. The CHI compiler extends the OpenMP pragma for heterogeneous multithreading programming, and produces a single fat binary with code sections corresponding to different instruction sets. The runtime can judiciously spread parallel computation across the heterogeneous cores to optimize performance and power. We have prototyped the EXO architecture on a physical heterogeneous platform consisting of an Intel Core 2 Duo processor and an 8-core 32-thread Intel Graphics Media Accelerator X3000. In addition, we have implemented the CHI integrated programming environment with the Intel C++ Compiler, runtime toolset, and debugger. On the EXO prototype system, we have enhanced a suite of production-quality media kernels for video and image processing to utilize the accelerator through the CHI programming interface, achieving significant speedup (1.41X to10.97X) over execution on the IA32 CPU alone.
April 11, 2011 by hgpu