Simple optimizations for an applicative array language for graphics processors
Department of Computer Science, Tufts University
Proceedings of the sixth workshop on Declarative aspects of multicore programming, DAMP ’11, 2011
@inproceedings{larsen2011simple,
title={Simple optimizations for an applicative array language for graphics processors},
author={Larsen, B.},
booktitle={Proceedings of the sixth workshop on Declarative aspects of multicore programming},
pages={25–34},
year={2011},
organization={ACM}
}
Graphics processors (GPUs) are highly parallel devices that promise high performance, and they are now flexible enough to be used for general-purpose computing. A programming language based on implicitly data-parallel collective array operations can permit high-level, effective programming of GPUs. I describe three optimizations for such a language: automatic use of GPU shared memory cache, array fusion, and hoisting of nested parallel constructs. These optimizations are simple to implement because of the design of the language to which they are applied but can result in large run-time speedups.
September 23, 2011 by hgpu