high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures

A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures

Mitch Horton, Stanimire Tomov, Jack Dongarra

Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996

Symposium on Application Accelerators in High-Performance Computing (SAAHPC), 2011

DOI:10.1109/SAAHPC.2011.18

@inproceedings{horton2011class,

title={A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures},

author={Horton, M. and Tomov, S. and Dongarra, J.},

booktitle={Application Accelerators in High-Performance Computing (SAAHPC), 2011 Symposium on},

pages={150–158},

year={2011},

organization={IEEE}

}

Download (PDF)

View

Source

2579

views

Three out of the top four supercomputers in the November 2010 TOP500 list of the world’s most powerful supercomputers use NVIDIA GPUs to accelerate computations. Ninety-five systems from the list are using processors with six or more cores. Three-hundred-sixty-five systems use quad-core processor-based systems. Thirty-seven systems are using dual-core processors. The large-scale enabling of hybrid graphics processing unit (GPU)-based multicore platforms for computational science by developing fundamental numerical libraries (in particular, libraries in the area of dense linear algebra) for them has been underway for some time. We present a class of algorithms based largely on software infrastructures that have already been developed for homogeneous multicores and hybrid GPU-based computing. The algorithms extend what is currently available in the Matrix Algebra for GPU and Multicore Architectures (MAGMA) Library for performing Cholesky, QR, and LU factorizations using a single core or socket and a single GPU. The extensions occur in two areas. First, panels factored on the CPU using LAPACK are, instead, done in parallel using a highly optimized dynamic asynchronous scheduled algorithm on some number of CPU cores. Second, the remaining CPU cores are used to update the rightmost panels of the matrix in parallel.

Tags: Algorithms, Computer science, Factorization, Linear Algebra, nVidia, nVidia GeForce GTX 480, Tesla M2070

November 8, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures

Share this:

Recent source codes

Most viewed papers (last 30 days)