Explicit Fourth-Order Runge-Kutta Method on Intel Xeon Phi Coprocessor
Department of Computer Science, Maria Curie-Sklodowska University, Plac M. Curie-Sklodowskiej 1, 20-031 Lublin, Poland
International Journal of Parallel Programming, pp 1-18, 2016
@article{bylina2016explicit,
title={Explicit Fourth-Order Runge–Kutta Method on Intel Xeon Phi Coprocessor},
author={Bylina, Beata and Potiopa, Joanna},
journal={International Journal of Parallel Programming},
pages={1–18},
year={2016},
publisher={Springer}
}
This paper concerns an Intel Xeon Phi implementation of the explicit fourth-order Runge-Kutta method (RK4) for very sparse matrices with very short rows. Such matrices arise during Markovian modeling of computer and telecommunication networks. In this work an implementation based on Intel Math Kernel Library (Intel MKL) routines and the authors’ own implementation, both using the CSR storage scheme and working on Intel Xeon Phi, were investigated. The implementation based on the Intel MKL library uses the high-performance BLAS and Sparse BLAS routines. In our application we focus on OpenMP style programming. We implement SpMV operation and vector addition using the basic optimizing techniques and the vectorization. We evaluate our approach in native and offload modes for various number of cores and thread allocation affinities. Both implementations (based on Intel MKL and made by the authors) were compared in respect of the time, the speedup and the performance. The numerical experiments on Intel Xeon Phi show that the performance of authors’ implementation is very promising and gives a gain of up to two times compared to the multithreaded implementation (based on Intel MKL) running on CPU (Intel Xeon processor) and even three times in comparison with the application which uses Intel MKL on Intel Xeon Phi.
October 4, 2016 by hgpu