Acceleration of real-life stencil codes on GPUs
University of Rennes 1
University of Rennes 1, preprint dumas-00636254, 2011
@article{barigou2011acceleration,
title={Acceleration of real-life stencil codes on GPUs},
author={Barigou, Y.},
year={2011}
}
Among the compute intensive applications, the FDTD (Finite-Difference-Time-Domain) allows to solve time-dependent differential equations. This method is commonly used in electromagnetism to find solutions to Maxwell equations. Although considered as powerful and flexible, the FDTD algorithm has the disadvantage of having a huge numerical dispersion due to nested loops, which are loops inside other loops in the same program. Therefore, the execution time required for such methods is often prohibitive, particularly when the number of loop iterations is large, or when coupled with global optimization algorithms such as genetic algorithms cite{genetic}. Nevertheless, there exist some program transformations techniques, such as loop tiling, that can be applied on code, precisely on nested loops, in order to improve its execution time on parallel architectures. This is done by dividing computations into independent blocks, then assigning each block to a processor within a parallel machine. Loop tiling transformation is particularly appropriate for GPUs, because they are massively parallel and powerful. During our internship, we have focused on how to optimize the execution of a FDTD-based application developed by Rolland et al. on GPUs, by applying such transformation. In this report, Section 2 describes the context of the internship by presenting, in general, our target application and reviews the CUDA programming model and loop tiling transformation. Section 3 deals with the recent works related to FDTD on GPUs. Section 4 describes in detail the FDTD algorithm of the target application, and presents the different approaches of loop tiling transformation that can be applied on it.
January 2, 2012 by hgpu