high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Automatically Tuned Dense Linear Algebra for Multicore+GPU

Automatically Tuned Dense Linear Algebra for Multicore+GPU

Xing Fu, Xue Li, Gregory D. Peterson

Department of Electrical Engineering and Computer Science, The University of Tennessee, Knoxville

Symposium on Application Accelerators in High Performance Computing, 2010

BibTeX

Download (PDF)

View

Source

Source codes

Package:

MAGMA 1.0 RC3

1667

views

The Multicore+GPU architecture has been adopted in some of the fastest supercomputers listed on the TOP500. The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures processors like Multicore+GPU. However, to provide portable performance, manual parameter tuning is required. This paper presents automatically tuned LU factorization. The key parameter of LU factorization is tuned automatically to optimize performance for a particular GPU platform. Moreover, we propose a work stealing scheme and GREEN-synchronization to decrease the power consumption of the LU factorization and accelerate the entire application.

Tags: Computer science, CUDA, Linear Algebra, nVidia, nVidia GeForce GTX 280, Package

February 17, 2011 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Automatically Tuned Dense Linear Algebra for Multicore+GPU

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Automatically Tuned Dense Linear Algebra for Multicore+GPU

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)