GPU acceleration of Runge Kutta-Fehlberg and its comparison with Dormand-Prince method
School of Informatics & Applied Mathematics, University Malaysia Terengganu, 20130 Kuala Terengganu, Terengganu Malaysia
AIP Conf. Proc. 1605, 16, 2014
@article{seen2014gpu,
author={Seen, Wo Mei and Gobithaasan, R. U. and Miura, Kenjiro T.},
title={GPU acceleration of Runge Kutta-Fehlberg and its comparison with Dormand-Prince method},
journal={AIP Conference Proceedings},
year={2014},
volume={1605},
pages={16-21},
url={http://scitation.aip.org/content/aip/proceeding/aipcp/10.1063/1.4887558},
doi={http://dx.doi.org/10.1063/1.4887558}
}
There is a significant reduction of processing time and speedup of performance in computer graphics with the emergence of Graphic Processing Units (GPUs). GPUs have been developed to surpass Central Processing Unit (CPU) in terms of performance and processing speed. This evolution has opened up a new area in computing and researches where highly parallel GPU has been used for non-graphical algorithms. Physical or phenomenal simulations and modelling can be accelerated through General Purpose Graphic Processing Units (GPGPU) and Compute Unified Device Architecture (CUDA) implementations. These phenomena can be represented with mathematical models in the form of Ordinary Differential Equations (ODEs) which encompasses the gist of change rate between independent and dependent variables. ODEs are numerically integrated over time in order to simulate these behaviours. The classical Runge-Kutta (RK) scheme is the common method used to numerically solve ODEs. The Runge Kutta Fehlberg (RKF) scheme has been specially developed to provide an estimate of the principal local truncation error at each step, known as embedding estimate technique. This paper delves into the implementation of RKF scheme for GPU devices and compares its result with Dorman Prince method. A pseudo code is developed to show the implementation in detail. Hence, practitioners will be able to understand the data allocation in GPU, formation of RKF kernels and the flow of data to/from GPU-CPU upon RKF kernel evaluation. The pseudo code is then written in C Language and two ODE models are executed to show the achievable speedup as compared to CPU implementation. The accuracy and efficiency of the proposed implementation method is discussed in the final section of this paper.
July 18, 2014 by hgpu