On the Use of Small 2D Convolutions on GPUs
Delft University of Technology, Delft, The Netherlands
Computer Architecture, Lecture Notes in Computer Science, Volume 6161/2012, 52-64, 2012
@inproceedings{al2012use,
title={On the Use of Small 2D Convolutions on GPUs},
author={Al Umairy, S. and van Amesfoort, A. and Setija, I. and van Beurden, M. and Sips, H.},
booktitle={Computer Architecture},
pages={52–64},
year={2012},
organization={Springer}
}
Computing many small 2D convolutions using FFTs is a basis for a large number of applications in many domains in science and engineering, among them electromagnetic diffraction modeling in physics. The GPU architecture seems to be a suitable architecture to accelerate these convolutions, but reaching high application performance requires substantial development time and non-portable optimizations. In this work, we present the techniques, performance results and considerations to accelerate small 2D convolutions using CUDA, and compare performance to a multi-threaded CPU implementation. To improve programmability and performance of applications that make heavy use of small convolutions, we argue that two improvements to software and hardware are needed: FFT libraries must be extended with a single convolution function and communication bandwidth between CPU and GPU needs to be drastically improved.
March 15, 2012 by hgpu