high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » CULA: hybrid GPU accelerated linear algebra routines

CULA: hybrid GPU accelerated linear algebra routines

John R. Humphrey, Daniel K. Price, Kyle E. Spagnoli, Aaron L. Paolini, Eric J. Kelmelis

EM Photonics, Inc, 51 E Main St, Suite 203, Newark, DE, 19711, ETATS-UNIS

Modeling and Simulation for Defense Systems and Applications V. Edited by Kelmelis, Eric J. Proceedings of the SPIE, Volume 7705, pp. 770502-770502-7 (2010).

DOI:10.1117/12.850538

@conference{humphrey2010cula,

title={CULA: hybrid GPU accelerated linear algebra routines},

author={Humphrey, J.R. and Price, D.K. and Spagnoli, K.E. and Paolini, A.L. and Kelmelis, E.J.},

booktitle={Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series},

volume={7705},

pages={1},

issn={0277-786X},

year={2010}

}

Download (PDF)

View

Source

Source codes

Package:

CULA

2026

views

The modern graphics processing unit (GPU) found in many standard personal computers is a highly parallel math processor capable of nearly 1 TFLOPS peak throughput at a cost similar to a high-end CPU and an excellent FLOPS/watt ratio. High-level linear algebra operations are computationally intense, often requiring O(N3) operations and would seem a natural fit for the processing power of the GPU. Our work is on CULA, a GPU accelerated implementation of linear algebra routines. We present results from factorizations such as LU decomposition, singular value decomposition and QR decomposition along with applications like system solution and least squares. The GPU execution model featured by NVIDIA GPUs based on CUDA demands very strong parallelism, requiring between hundreds and thousands of simultaneous operations to achieve high performance. Some constructs from linear algebra map extremely well to the GPU and others map poorly. CPUs, on the other hand, do well at smaller order parallelism and perform acceptably during low-parallelism code segments. Our work addresses this via hybrid a processing model, in which the CPU and GPU work simultaneously to produce results. In many cases, this is accomplished by allowing each platform to do the work it performs most naturally.

Tags: Computer science, CUDA, CULA, Linear Algebra, nVidia, Package, Tesla C1060

April 11, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

CULA: hybrid GPU accelerated linear algebra routines

Package:

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

CULA: hybrid GPU accelerated linear algebra routines

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)