9177

LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System

Jakub Kurzak, P. Luszczek, Mathieu Faverge, Jack J. Dongarra
Electrical Engineering and Computer Science, University of Tennessee
10th International Meeting on High-Performance Computing for Computational Science (VECPAR), 2012

@inproceedings{kurzak2012lu,

   title={LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System},

   author={Kurzak, Jakub and Luszczek, P and Faverge, Mathieu and Dongarra, Jack J and others},

   booktitle={VECPAR 2012-10th International Meeting on High-Performance Computing for Computational Science},

   year={2012}

}

Download Download (PDF)   View View   Source Source   

821

views

LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance Linpack benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. The optimizations include lookahead, dynamic task scheduling, fine grain parallelism for memory-bound operations, autotuning, and data layout geared towards complex memory hierarchies. Performance in excess of one Tera flop/s is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: