Exploiting Concurrency Patterns with Heterogeneous Task and Data Parallelism

hgpu.org » Applications » Computer science » Exploiting Concurrency Patterns with Heterogeneous Task and Data Parallelism

Exploiting Concurrency Patterns with Heterogeneous Task and Data Parallelism

A Geetha Venkatesh

Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore

Indian Institute of Science, 2014

@article{venkatesh2014exploiting,

title={Exploiting Concurrency Patterns with Heterogeneous Task and Data Parallelism},

author={Venkatesh, A Geetha and Education, Supercomputer},

year={2014}

}

Download (PDF)

View

Source

1597

views

Parallel programming of an application requires not only domain knowledge of the application, but also programming environment support and in-depth awareness of the target architecture. Often, all concurrency features of the architecture are not exposed to the programming environment. The challenge lies in efficient utilization of these unexposed features to write effective parallel programs. In our work, we explore different modes of OpenCL programming and focus on a specific application to reach its best performance. We have chosen iterative Strassen’s matrix multiplication as our test application, as it exhibits variable amount of parallelism in each step and iteration. We explore a few parallel manifestations of the application based on fixed memory hierarchy and accounting for environmental constraints. These manifestations exploit various types of parallelism exhibited by the application such as data parallelism, task parallelism or a combination of both. Concurrent Collections (CnC) is an architecture agnostic programming model in favor of the application developer. The dynamic execution model of CnC results in the best possible performance of the application. In CnC, parallelism of an application is expressed at the computational step level. Exploiting the parallelism within a coarse-grained computational step is a challenging task for CnC auto tuners. Ease of programming using CnC is adversely affected when we need to express fine-grained parallelism. OpenCL allows fine-grained programming through its hierarchical kernel structure. Through this work, we propose a close realization of the CnC programming model using OpenCL for achieving fine-grained parallelism and dynamic execution nature. We evaluate our work on two different architectures: nVidia Fermi C2070 GPGPU and Intel Core i3-350M 64-bit processor.

Tags: Computer science, Data parallelism, Heterogeneous systems, Matrix multiplication, nVidia, OpenCL, Tesla C2070, Thesis

February 3, 2015 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org