Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor
Dept. of Electrical and Computer Engineering, University of British Columbia
In PACT ’08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques (2008), pp. 52-61
@conference{wong2008pangaea,
title={Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor},
author={Wong, H. and Bracy, A. and Schuchman, E. and Aamodt, T.M. and Collins, J.D. and Wang, P.H. and Chinya, G. and Groen, A.K. and Jiang, H. and Wang, H.},
booktitle={Proceedings of the 17th international conference on Parallel architectures and compilation techniques},
pages={52–61},
year={2008},
organization={ACM}
}
Moore’s Law and the drive towards performance efficiency have led to the on-chip integration of general-purpose cores with special-purpose accelerators. Pangaea is a heterogeneous CMP design for non-rendering workloads that integrates IA32 CPU cores with non-IA32 GPU-class multi-cores, extending the current state-of-the-art CPU-GPU integration that physically “fuses” existing CPU and GPU designs. Pangaea introduces (1) a resource repartitioning of the GPU, where the hardware budget dedicated for 3D-specific graphics processing is used to build more general-purpose GPU cores, and (2) a 3-instruction extension to the IA32 ISA that supports tighter architectural integration and fine-grain shared memory collaborative multithreading between the IA32 CPU cores and the non-IA32 GPU cores. We implement Pangaea and the current CPU-GPU designs in fully-functional synthesizable RTL based on the production quality RTL of an IA32 CPU and an Intel GMA X4500 GPU. On a 65 nm ASIC process technology, the legacy graphics-specific fixed-function hardware has the area of 9 GPU cores and total power consumption of 5 GPU cores. With the ISA extensions, the latency from the time an IA32 core spawns a GPU thread to the time the thread begins execution is reduced from thousands of cycles to fewer than 30 cycles. Pangaea is synthesized on a FPGA-based prototype and runs off-the-shelf IA32 OSes. A set of general-purpose non-graphics workloads demonstrate speedups of up to 8.8x.
December 7, 2010 by hgpu