high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » An Automatic Input-Sensitive Approach for Heterogeneous Task Partitioning

An Automatic Input-Sensitive Approach for Heterogeneous Task Partitioning

Klaus Kofler, Ivan Grasso, Biagio Cosenza, Thomas Fahringer

Institute of Computer Science, University of Innsbruck, Austria

27th ACM international conference on Supercomputing, 2013

BibTeX

Download (PDF)

View

Source

2401

views

Unleashing the full potential of heterogeneous systems, consisting of multi-core CPUs and GPUs, is a challenging task due to the difference in processing capabilities, memory availability, and communication latencies of different computational resources. In this paper we propose a novel approach that automatically optimizes task partitioning for different (input) problem sizes and different heterogeneous architectures. We use the Insieme source-to-source compiler to translate a single-device OpenCL program into a multi-device OpenCL program. The Insieme Runtime System then performs dynamic task partitioning based on an offline-generated prediction model. In order to derive the prediction model, we use a machine learning approach based on Artificial Neural Networks (ANN) that incorporates static program features as well as dynamic, input sensitive features. Principal component analysis have been used to further improve the task partitioning. Our approach has been evaluated over a suite of 23 programs and respectively achieves a performance improvement of 22% and 25% compared to an execution of the benchmarks on a single CPU and a single GPU which is equal to 87.5% of the optimal performance.

Tags: ATI, ATI Radeon HD 5870, Code generation, Compilers, Computer science, Heterogeneous systems, Machine learning, Neural networks, nVidia, nVidia GeForce GTX 480, OpenCL, Performance, Task scheduling

April 21, 2013 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

An Automatic Input-Sensitive Approach for Heterogeneous Task Partitioning

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

An Automatic Input-Sensitive Approach for Heterogeneous Task Partitioning

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)