9177

LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System

Jakub Kurzak, P. Luszczek, Mathieu Faverge, Jack J. Dongarra
Electrical Engineering and Computer Science, University of Tennessee
10th International Meeting on High-Performance Computing for Computational Science (VECPAR), 2012
BibTeX

Download Download (PDF)   View View   Source Source   

1866

views

LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance Linpack benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. The optimizations include lookahead, dynamic task scheduling, fine grain parallelism for memory-bound operations, autotuning, and data layout geared towards complex memory hierarchies. Performance in excess of one Tera flop/s is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org