Adaptable Two-Dimension Sliding Windows on NVIDIA GPUs with Runtime Compilation
Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts
Symposium on Application Accelerators in High Performance Computing, SAAHPC 2011, 201l
@article{moore2011adaptable,
title={Adaptable Two-Dimension Sliding Windows on NVIDIA GPUs with Runtime Compilation},
author={Moore, N. and Leeser, M. and King, L.S.},
year={2011},
booktitle={Symposium on Application Accelerators in High Performance Computing, SAAHPC 2011}
}
For some classes of problems, NVIDIA CUDA abstraction and hardware properties combine with problem characteristics to limit the specific problem instances that can be effectively accelerated. As a real-world example, a twodimensional correlation-based template-matching MATLAB application is considered. While this problem has a well known solution for the common case of linear image filtering-small fixed templates of a known size applied to a much larger image-the application considered here uses large arbitrarilysized templates, up to 156-by-116 pixels, with small search spaces containing no more than 703 window positions per template. Our CUDA implementation approach employs template tiling and problem-specific kernel compilation to achieve speedups of up to 15 when compared to an optimized multi-threaded implementation running on a 3.33 GHz four core Intel Nehalem processor. Tiling the template enables exploiting the parallelism within the computation and shared memory usage. At the same time, problem-specific kernel compilation allows greater levels of adaptability than would otherwise be possible.
September 30, 2011 by hgpu