high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » LU Factorization with Partial Pivoting for a Multicore System with Accelerators

LU Factorization with Partial Pivoting for a Multicore System with Accelerators

Jakub Kurzak, Piotr Luszczek, Mathieu Faverge, Jack Dongarra

Department of Electrical Engineering and Computer Science, University of Tennessee

IEEE Transactions on Parallel and Distributed Computing, vol. 24, no. 8, pp. 1613-1621, 2013

DOI:10.1109/TPDS.2012.242

@article{kurzak2013lu,

title={LU Factorization with Partial Pivoting for a Multicore System with Accelerators},

author={Kurzak, Jakub and Luszczek, Piotr and Faverge, Mathieu and Dongarra, Jack},

year={2013},

publisher={IEEE}

}

Download (PDF)

View

Source

2233

views

LU factorization with partial pivoting is a canonical numerical procedure and the main component of the high performance LINPACK benchmark. This paper presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. The difficulty of implementing the algorithm for such a system lies in the disproportion between the computational power of the CPUs, compared to the GPUs, and in the meager bandwidth of the communication link between their memory systems. An additional challenge comes from the complexity of the memory-bound and synchronization-rich nature of the panel factorization component of the block LU algorithm, imposed by the use of partial pivoting. The challenges are tackled with the use of a data layout geared toward complex memory hierarchies, autotuning of GPU kernels, fine-grain parallelization of memorybound CPU operations and dynamic scheduling of tasks to different devices. Performance in excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.

Tags: Algorithms, Computer science, CUDA, Factorization, nVidia, Tesla S2050

July 29, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

LU Factorization with Partial Pivoting for a Multicore System with Accelerators

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

LU Factorization with Partial Pivoting for a Multicore System with Accelerators

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)