Customizable Memory Schemes for Data Parallel Accelerators

hgpu.org » Applications » Computer science » Customizable Memory Schemes for Data Parallel Accelerators

Customizable Memory Schemes for Data Parallel Accelerators

Chunyang Gou

Computer Engineering Laboratory, Electrical Engineering, Mathematics and Computer Science, Delft University of Technology

Delft University of Technology, 2011

@article{chunyang2011customizable,

title={Customizable Memory Schemes for Data Parallel Accelerators},

author={Chunyang, GOU},

year={2011}

}

Download (PDF)

View

Source

1886

views

Memory system efficiency is crucial for any processor to achieve high performance, especially in the case of data parallel machines. Processing capabilities of parallel lanes will be wasted, when data requests are not accomplished in a sustainable and timely manner. Irregular vector memory accesses can lead to inefficient use of the parallel banks/modules/channels and significantly degrade overall performance even when highly parallel memory systems are employed. This problem is also valid for many regular workloads exhibiting irregular vector accesses at runtime. This dissertation identifies the mismatch between the optimal access patterns required by the workloads and the physical data layout as one of the major factors for memory access inefficiency. We propose customizable memory schemes to address this issue in data parallel accelerators. More specifically, this thesis extends traditional approaches by proposing two new parallel memory schemes that alleviate bank conflicts for commonly used access patterns. We also propose a framework to capture and convey the access pattern information to the proposed parallel memory schemes. Furthermore, we describe techniques that dynamically adjust the instruction sequencer of a multithreaded vector architecture and customize the access patterns to improve on-chip, local memory efficiency. Last, we identify and exploit new locality type to dynamically adjust off-chip memory access granularity of manycore data parallel architectures, in order to improve main memory efficiency. We implemented our proposals as extensions of contemporary data parallel architectures and our evaluation results demonstrate that memory efficiency and overall system performance can be improved at minimal hardware cost, while at the same time programming overhead can be greatly reduced.

Tags: Computer science, CUDA, Memory model, nVidia, nVidia GeForce GTX 280, Performance, Programming techniques, Thesis

December 17, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org