Efficient Shallow Water Simulations on GPUs
Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts
SIAM Conference on Mathematical and Computational Issues in the Geosciences, 2011
@article{brodtkorb2011efficient,
title={Efficient Shallow Water Simulations on GPUs},
author={Brodtkorb, A.R.},
year={2011}
}
For some classes of problems, NVIDIA CUDA abstraction and hardware properties combine with problem characteristics to limit the specific problem instances that can be effectively accelerated. As a real-world example, a twodimensional correlation-based template-matching MATLAB application is considered. While this problem has a well known solution for the common case of linear image filtering-small fixed templates of a known size applied to a much larger image-the application considered here uses large arbitrarilysized templates, up to 156-by-116 pixels, with small search spaces containing no more than 703 window positions per template. Our CUDA implementation approach employs template tiling and problem-specific kernel compilation to achieve speedups of up to 15 when compared to an optimized multi-threaded implementation running on a 3.33 GHz four core Intel Nehalem processor. Tiling the template enables exploiting the parallelism within the computation and shared memory usage. At the same time, problem-specific kernel compilation allows greater levels of adaptability than would otherwise be possible.
November 22, 2011 by hgpu