Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor

Henry Wong, Anne Bracy, Ethan Schuchman, Tor M. Aamodt, Jamison D. Collins, Perry H. Wang, Gautham Chinya, Ankur K. Groen, Hong Jiang, Hong Wang
Dept. of Electrical and Computer Engineering, University of British Columbia
In PACT ’08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques (2008), pp. 52-61


   title={Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor},

   author={Wong, H. and Bracy, A. and Schuchman, E. and Aamodt, T.M. and Collins, J.D. and Wang, P.H. and Chinya, G. and Groen, A.K. and Jiang, H. and Wang, H.},

   booktitle={Proceedings of the 17th international conference on Parallel architectures and compilation techniques},





Download Download (PDF)   View View   Source Source   



Moore’s Law and the drive towards performance efficiency have led to the on-chip integration of general-purpose cores with special-purpose accelerators. Pangaea is a heterogeneous CMP design for non-rendering workloads that integrates IA32 CPU cores with non-IA32 GPU-class multi-cores, extending the current state-of-the-art CPU-GPU integration that physically “fuses” existing CPU and GPU designs. Pangaea introduces (1) a resource repartitioning of the GPU, where the hardware budget dedicated for 3D-specific graphics processing is used to build more general-purpose GPU cores, and (2) a 3-instruction extension to the IA32 ISA that supports tighter architectural integration and fine-grain shared memory collaborative multithreading between the IA32 CPU cores and the non-IA32 GPU cores. We implement Pangaea and the current CPU-GPU designs in fully-functional synthesizable RTL based on the production quality RTL of an IA32 CPU and an Intel GMA X4500 GPU. On a 65 nm ASIC process technology, the legacy graphics-specific fixed-function hardware has the area of 9 GPU cores and total power consumption of 5 GPU cores. With the ISA extensions, the latency from the time an IA32 core spawns a GPU thread to the time the thread begins execution is reduced from thousands of cycles to fewer than 30 cycles. Pangaea is synthesized on a FPGA-based prototype and runs off-the-shelf IA32 OSes. A set of general-purpose non-graphics workloads demonstrate speedups of up to 8.8x.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: