high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory

Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory

Michela Becchi, Surendra Byna, Srihari Cadambi, Srimat Chakradhar

NEC Laboratories America, Inc., Princeton, NJ, USA

In Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures (2010), pp. 82-91.

DOI:10.1145/1810479.1810498

@conference{becchi2010data,

title={Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory},

author={Becchi, M. and Byna, S. and Cadambi, S. and Chakradhar, S.},

booktitle={Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures},

pages={82–91},

year={2010},

organization={ACM}

}

Source

2088

views

In this paper, we describe a runtime to automatically enhance the performance of applications running on heterogeneous platforms consisting of a multi-core (CPU) and a throughput-oriented many-core (GPU). The CPU and GPU are connected by a non-coherent interconnect such as PCI-E, and as such do not have shared memory. Heterogeneous platforms available today such as [9] are of this type. Our goal is to enable the programmer to seamlessly use such a system without rewriting the application and with minimal knowledge of the underlying architectural details. Assuming that applications perform function calls to computational kernels with available CPU and GPU implementations, our runtime achieves this goal by automatically scheduling the kernels and managing data placement. In particular, it intercepts function calls to well-known computational kernels and schedules them on CPU or GPU based on their argument size and location. To improve performance, it defers all data transfers between the CPU and the GPU until necessary. By managing data placement transparently to the programmer, it provides a unified memory view despite the underlying separate memory sub-systems. We experimentally evaluate our runtime on a heterogeneous platform consisting of a 2.5GHz quad-core Xeon CPU and an NVIDIA C870 GPU. Given array sorting, parallel reduction, dense and sparse matrix operations and ranking as computational kernels, we use our runtime to automatically retarget SSI [25], K-means [32] and two synthetic applications to the above platform with no code changes. We find that, in most cases, performance improves if the computation is moved to the data, and not vice-versa. For instance, even if a particular instance of a kernel is slower on the GPU than on the CPU, the overall application may be faster if the kernel is scheduled on the GPU anyway, especially if the kernel data is already located on the GPU memory due to prior decisions. Our results show that data-aware CPU/GPU scheduling improves performance by up to 25% over the best data-agnostic scheduling on the same platform.

Tags: Computer science, CUDA, Distributed data structures, Heterogeneous systems, nVidia, Performance, Programming Languages, Sparse matrix, Tesla C870

May 10, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory

Your response

Recent source codes

TRUST: a thermalhydraulic software package for CFD simulations

Modular: The Modular Platform (includes MAX & Mojo)

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

HPC Benchmark Survey

HDM: Home made Diffusion Models

General Matrix Multiplication (GEMM)

CrossTL: Universal Programming Language & Translator

TBD-GPU

DG-SWEM - The Discontinuous Galerkin Shallow Water Equation Model

Most viewed papers (last 30 days)

Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)