13424

Exploiting Concurrency Patterns with Heterogeneous Task and Data Parallelism

A Geetha Venkatesh
Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore
Indian Institute of Science, 2014

@article{venkatesh2014exploiting,

   title={Exploiting Concurrency Patterns with Heterogeneous Task and Data Parallelism},

   author={Venkatesh, A Geetha and Education, Supercomputer},

   year={2014}

}

Download Download (PDF)   View View   Source Source   

1597

views

Parallel programming of an application requires not only domain knowledge of the application, but also programming environment support and in-depth awareness of the target architecture. Often, all concurrency features of the architecture are not exposed to the programming environment. The challenge lies in efficient utilization of these unexposed features to write effective parallel programs. In our work, we explore different modes of OpenCL programming and focus on a specific application to reach its best performance. We have chosen iterative Strassen’s matrix multiplication as our test application, as it exhibits variable amount of parallelism in each step and iteration. We explore a few parallel manifestations of the application based on fixed memory hierarchy and accounting for environmental constraints. These manifestations exploit various types of parallelism exhibited by the application such as data parallelism, task parallelism or a combination of both. Concurrent Collections (CnC) is an architecture agnostic programming model in favor of the application developer. The dynamic execution model of CnC results in the best possible performance of the application. In CnC, parallelism of an application is expressed at the computational step level. Exploiting the parallelism within a coarse-grained computational step is a challenging task for CnC auto tuners. Ease of programming using CnC is adversely affected when we need to express fine-grained parallelism. OpenCL allows fine-grained programming through its hierarchical kernel structure. Through this work, we propose a close realization of the CnC programming model using OpenCL for achieving fine-grained parallelism and dynamic execution nature. We evaluate our work on two different architectures: nVidia Fermi C2070 GPGPU and Intel Core i3-350M 64-bit processor.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: