6358

Parallelizing Multicore Cache Simulations using Heterogeneous Computing on General Purpose and Graphics Processors

Nikolaos Strikos, Georgios Keramidas, Stefanos Kaxiras
Department of Electrical and Computer Engineering, University of Patras, Greece
Fourth Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG-2011), 2011

@inproceedings{strikos2011parallelizing,

   title={Parallelizing Multicore Cache Simulations using Heterogeneous Computing on General Purpose and Graphics Processors},

   author={Strikos, N. and Keramidas, G. and Kaxiras, S.},

   booktitle={Fourth Workshop on Programmability Issues for Multi-Core Computers (MULTIPROG-2011)},

   pages={107},

   year={2011}

}

Download Download (PDF)   View View   Source Source   

801

views

Traditional trace-driven memory system simulation is a very time consuming process while the advent of multi-cores simply exacerbates the problem. We propose a framework for accelerating trace-driven multi-core cache simulations by utilizing the capabilities of the modern many-core Graphic Processing Units (GPUs). A straightforward way towards this direction is to rely on the inherent parallelism in cache simulations: communicating cache sets can be simulated independently and concurrently to other sets. Based on this, we map collections of communicating cache sets (each belonging to a different target cache) on the same GPU block so that the simulated coherence traffic is local traffic in the GPU. However, this is not enough due to the great imbalance in the activity in the different sets: some sets receive a flurry of activity while others do not. Our solution is to load balance the simulated sets (based on activity) on the computing element (host-CPU or GPU) that can manage them in the most efficient way. We propose a heterogeneous computing approach in which the host-CPU simulates the few but most active sets, while the GPU is responsible for the many more but less active sets. Our experimental findings using the SPLASH-2 suite demonstrate that our cache simulator based on the CPU-GPU cooperation achieves on average 5.88x (2.47x) speedup over alternative implementations running on CPU (GPU), speedups which scale well with the size of the simulated system.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: