Performance Comparison of Cholesky Decomposition on GPUs and FPGAs

Depeng Yang, Junqing Sun, JunKu Lee, Getao Liang, David D. Jenkins, Gregory D. Peterson, and Husheng Li
Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, 37996
Symposium on Application Accelerators in High Performance Computing, 2010


   title={Performance Comparison of Cholesky Decomposition on GPUs and FPGAs},

   author={Yang, Depeng and Sun, Junqing and Lee, JunKu and Liang, Getao and Jenkins, David D. and Peterson, Gregory D. and Li, Husheng},

   booktitle={Application Accelerators in High Performance Computing, 2010 Symposium, Papers},



Download Download (PDF)   View View   Source Source   



Cholesky decomposition has been widely utilized for positive symmetric matrix factorization in solving least square problems. Various parallel accelerators including GPUs and FPGAs have been explored to improve performance. In this paper, Cholesky decomposition is implemented on both FPGAs and GPUs by designing a dedicated architecture for FPGAs and exploiting massively parallel computation for GPUs. Performance of the Cholesky decomposition on GPUs, CPUs, FPGAs, and hybrid systems are compared in both single and double precision. Results show that the FPGA implementation has the highest efficiency with respect to clock cycles compared with our pure GPU implementation, a hybrid system with MAGMA, and a CPU with LAPACK. The GPU implementation is better than other implementations using MAGMA and LAPACK library for small matrices, and the hybrid system with MAGMA is the best for larger matrices.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: