high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators

Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators

Emmanuel Agullo, Cedric Augonnet, Jack Dongarra, Hatem Ltaief, Raymond Namyst, Jean Roman, Samuel Thibault, Stanimire Tomov

INRIA, LaBRI, University of Bordeaux

Symposium on Application Accelerators in High Performance Computing, 2010

BibTeX

Download (PDF)

View

Source

1828

views

Although the hardware has dramatically changed in the last few years, nodes of multicore chips augmented by Graphics Processing Units (GPUs) seem to be a trend of major importance. Previous approaches for scheduling dense linear operations on such a complex node led to high performance but at the double cost of not using the potential of all the cores and producing a static and non generic code. In this extended abstract, we present a new approach for scheduling dense linear algebra operations on multicore architectures with GPU accelerators using a dynamic scheduler capable of using the full potential of the node [1]. We underline the benefits both in terms of programmability and performance. We illustrate our approach with a Cholesky factorization relying on cutting edge GPU and CPU kernels [2], [3] achieving roughly 900 Gflop/s on an eight cores node accelerated with three NVIDIA Tesla GPUs.

Tags: Computer science, CUDA, Linear Algebra, nVidia, nVidia Quadro FX 5800, Task scheduling

February 17, 2011 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators

Share this:

Recent source codes

Most viewed papers (last 30 days)