Customizable Memory Schemes for Data Parallel Accelerators

Chunyang Gou
Computer Engineering Laboratory, Electrical Engineering, Mathematics and Computer Science, Delft University of Technology
Delft University of Technology, 2011


   title={Customizable Memory Schemes for Data Parallel Accelerators},

   author={Chunyang, GOU},



Download Download (PDF)   View View   Source Source   



Memory system efficiency is crucial for any processor to achieve high performance, especially in the case of data parallel machines. Processing capabilities of parallel lanes will be wasted, when data requests are not accomplished in a sustainable and timely manner. Irregular vector memory accesses can lead to inefficient use of the parallel banks/modules/channels and significantly degrade overall performance even when highly parallel memory systems are employed. This problem is also valid for many regular workloads exhibiting irregular vector accesses at runtime. This dissertation identifies the mismatch between the optimal access patterns required by the workloads and the physical data layout as one of the major factors for memory access inefficiency. We propose customizable memory schemes to address this issue in data parallel accelerators. More specifically, this thesis extends traditional approaches by proposing two new parallel memory schemes that alleviate bank conflicts for commonly used access patterns. We also propose a framework to capture and convey the access pattern information to the proposed parallel memory schemes. Furthermore, we describe techniques that dynamically adjust the instruction sequencer of a multithreaded vector architecture and customize the access patterns to improve on-chip, local memory efficiency. Last, we identify and exploit new locality type to dynamically adjust off-chip memory access granularity of manycore data parallel architectures, in order to improve main memory efficiency. We implemented our proposals as extensions of contemporary data parallel architectures and our evaluation results demonstrate that memory efficiency and overall system performance can be improved at minimal hardware cost, while at the same time programming overhead can be greatly reduced.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: