https://hgpu.org/?p=9177
LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System