high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures

Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures

Thierry Gautier, Joao V. F. Lima, Nicolas Maillard, Bruno Raffin

Federal University of Rio Grande do Sul (UFRGS), Brazil

hal-00780890, 24 January 2013

BibTeX

Download (PDF)

View

Source

Source codes

Package:

XKaapi

2082

views

Most recent HPC platforms have heterogeneous nodes composed of a combination of multi-core CPUs and accelerators, like GPUs. Scheduling on such architectures relies on a static partitioning and cost model. In this paper, we present a locality-aware work stealing scheduler for multi-CPU and multi-GPU architectures, which relies on the XKaapi runtime system. We show performance results on two dense linear algebra kernels, Cholesky (POTRF) and LU (GETRF) factorization, to evaluate our scheduler on a heterogeneous architecture composed of two hexa-core CPUs and eight NVIDIA Fermi GPUs. Our experiments show that an online locality-aware scheduling achieve performance results as good as static strategies, and in most cases outperform them.

Tags: Computer science, CUDA, Factorization, Heterogeneous systems, Linear Algebra, nVidia, Package, Tesla C2050

January 25, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures

Package:

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)