Extending the Scalability of Single Chip Stream Processors with On-chip Caches
University of British Columbia, Vancouver, BC, Canada
2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, CMP-MSI 2008, 2008
@article{bakhodaextending,
title={Extending the Scalability of Single Chip Stream Processors with On-chip Caches},
author={Bakhoda, A. and Aamodt, T.M.},
year={2008},
publisher={Citeseer}
}
As semiconductor scaling continues, more transistors can be put onto the same chip despite growing challenges in clock frequency scaling. Stream processor architectures can make effective use of these additional resources for appropriate applications. However, it is important that programmer effort be amortized across future generations of stream processor architectures. Current industry projections suggest a single chip may be able to integrate several thousand 64-bit floating-point ALUs within the next decade. Future designs will require significantly larger, scalable onchip interconnection networks, which will likely increase memory access latency. While the capacity of the explicitly managed local store of current stream processor architectures could be enlarged to tolerate the added latency, existing stream processing software may require significant programmer effort to leverage such modifications. In this paper we propose a scalable stream processing architecture that addresses this issue. In our design, each stream processor has an explicitly managed local store model backed by an on-chip cache hierarchy. We evaluate our design using several parallel benchmarks to show the trade-offs of various cache and DRAM configurations. We show that addition of a 256KB L2 cache per memory controller increases the performance of our 16, 64 and 121 node stream processors designs (containing 128, 896, and 1760 ALUs, respectively) by 14.5%, 54.9% and 82.3% on average respectively. We find that even those applications that utilize the localstore in our study benefit significantly from the addition of L2 caches.
April 19, 2011 by hgpu