high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Synergistic execution of stream programs on multicores with accelerators

Synergistic execution of stream programs on multicores with accelerators

Abhishek Udupa, R. Govindarajan, Matthew J. Thazhuthaveetil

Department of Computer Science and Automation, Indian Institute of Science

In LCTES ’09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems (2009), pp. 99-108.

DOI:10.1145/1542452.1542466

@article{udupa2009synergistic,

title={Synergistic execution of stream programs on multicores with accelerators},

author={Udupa, A. and Govindarajan, R. and Thazhuthaveetil, M.J.},

journal={ACM Sigplan Notices},

volume={44},

number={7},

pages={99–108},

issn={0362-1340},

year={2009},

publisher={ACM}

}

Download (PDF)

View

Source

Source codes

Package:

StreamIt

1954

views

The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multicore architectures. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on accelerators such as Graphics Processing Units (GPUs) or CellBE which support abundant parallelism in hardware. In this paper, we describe a novel method to orchestrate the execution of a StreamIt program on a multicore platform equipped with an accelerator. The proposed approach identifies, using profiling, the relative benefits of executing a task on the superscalar CPU cores and the accelerator. We formulate the problem of partitioning the work between the CPU cores and the GPU, taking into account the latencies for data transfers and the required buffer layout transformations associated with the partitioning, as an integrated Integer Linear Program (ILP) which can then be solved by an ILP solver.We also propose an efficient heuristic algorithm for the work partitioning between the CPU and the GPU, which provides solutions which are within 9.05% of the optimal solution on an average across the benchmark suite. The partitioned tasks are then software pipelined to execute on the multiple CPU cores and the Streaming Multiprocessors (SMs) of the GPU. The software pipelining algorithm orchestrates the execution between CPU cores and the GPU by emitting the code for the CPU and the GPU, and the code for the required data transfers. Our experiments on a platform with 8 CPU cores and a GeForce 8800 GTS 512 GPU show a geometric mean speedup of 6.84X with a maximum of 51.96X over a single threaded CPU execution across the StreamIt benchmarks. This is a 18.9% improvement over a partitioning strategy that maps only the filters that cannot be executed on the GPU — the filters with state that is persistent across firings — onto the CPU.

Tags: Computer science, CUDA, High-level Languages, nVidia, nVidia GeForce 8800 GTS, Package

November 23, 2010 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Synergistic execution of stream programs on multicores with accelerators

Package:

Your response

Recent source codes

Awesome LLM-Driven Kernel Generation

PhysProver: Advancing Automatic Theorem Proving for Physics

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

BoltzGen:Toward Universal Binder Design

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

Most viewed papers (last 30 days)

Synergistic execution of stream programs on multicores with accelerators

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)