Balancing locality and concurrency: solving sparse triangular systems on GPUs

Andrea Picciau, Gordon E. Inggs, John Wickerson, Eric C. Kerrigan, George A. Constantinides
Department of Electrical and Electronic Engineering, Imperial College London, UK
IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC ’16), 2016


   title={Balancing locality and concurrency: solving sparse triangular systems on GPUs},

   author={Picciau, Andrea and Inggs, Gordon E and Wickerson, John and Kerrigan, Eric C and Constantinides, George A},




Download Download (PDF)   View View   Source Source   



Many numerical optimisation problems rely on fast algorithms for solving sparse triangular systems of linear equations (STLs). To accelerate the solution of such equations, two types of approaches have been used: on GPUs, concurrency has been prioritised to the disadvantage of data locality, while on multi-core CPUs, data locality has been prioritised to the disadvantage of concurrency. In this paper, we discuss the interaction between data locality and concurrency in the solution of STLs on GPUs, and we present a new algorithm that balances both. We demonstrate empirically that, subject to there being enough concurrency available in the input matrix, our algorithm outperforms Nvidia’s concurrency-prioritising CUSPARSE algorithm for GPUs. Experimental results show a maximum speedup of 5.8-fold. Our solution algorithm, which we have implemented in OpenCL, requires a pre-processing phase that partitions the graph associated with the input matrix into sub-graphs, whose data can be stored in low-latency local memories. This preliminary analysis phase is expensive, but because it depends only on the input matrix, its cost can be amortised when solving for many different right-hand sides.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: