Optimizing Data Locality for Iterative Matrix Solvers on CUDA

Raymond Flagg, Jason Monk, Yifeng Zhu, Bruce Segee
Department of Electrical and Computer Engineering, University of Maine, Orono, ME, USA
The 2013 International Conference on Parallel and Distributed, Processing Techniques and Applications (PDPTA’13), 2013


   title={Optimizing Data Locality for Iterative Matrix Solvers on CUDA},

   author={Flagg, Raymond and Monk, Jason},



Download Download (PDF)   View View   Source Source   



Solving systems of linear equations is an important problem that spans almost all fields of science and mathematics. When these systems grow in size, iterative methods are used to solve these problems. This paper looks at optimizing these methods for CUDA Architectures. It discusses a multi-threaded CPU implementation, a GPU implementation, and a data optimized GPU implementation. The optimized version uses an extra kernel to rearrange the problem data so that there are a minimal number of memory access and minimum thread divergence. The normal GPU implementation achieved a total speedup of 1.60X over the CPU version whereas the optimized version was able to achieve a total speedup of 1.78X. This paper demonstrates the importance of pre-organizing the data in iterative methods and its impact.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: