An Optimized GPU Memory Hierarchy Design for an OpenCL Kernel
Courant Institute of Mathematical Sciences, New York University
Courant Institute of Mathematical Sciences, 2014
@article{khan2014optimized,
title={An Optimized GPU Memory Hierarchy Design for an OpenCL Kernel},
author={Khan, Numair},
year={2014}
}
With the advent of multi and many-core processors, communication has replaced computation as the performance bottleneck. Most current approaches to the problem try to tolerate memory access latency through a high amount of Thread-Level Parallelism. However, not all applications benefit from such techniques and there is a need to address the weakness of the underlying memory system rather. This paper attempts to devise a physical memory organization for Graphics Processing Units that achieves performance gains for applications that do not effectively tolerate latency. Our resulting memory configuration manages to perform upto three times faster than an AMD Evergreen GPU.
December 16, 2014 by hgpu