Extension of the SkePU Skeleton Programming Framework for Multi-core CPU and Multi-GPU Systems for MPI-based Clusters

Swadhin K. Mangaraj
Department of Computer and Information Science, Software and Systems, Linkoping University
Linkoping University, 2013


   title={Extension of the SkePU Skeleton Programming Framework for Multi-core CPU and Multi-GPU Systems for MPI-based Clusters},

   author={Mangaraj, Swadhin K.},



Download Download (PDF)   View View   Source Source   



SkePU (Skeleton Programming Framework for Multi-core CPU and Multi-GPU Systems) is a parallel computing framework developed by Johan Enmyren and Christoph Kessler at Linkopings Universitet. This C++ template library provides a simple and unified interface for specifying data-parallel computations with the help of skeletons and is targeted to multiple backends e.g. for a sequential CPU, parallel CPUs using MPI and OpenMP or GPUs using CUDA and OpenCL. SkePU is comprised of seven data-parallel skeletons and one task-parallel skeleton and these skeletons use two types of containers: vector and matrix to model real-life parallel applications. In this thesis, we address the extension of the SkePU framework by extending the matrix container (which stores 2-D data values) that can efficiently use the existing skeletons to develop parallel scientific applications on large-scale clusters using MPI. This piece of work focuses on the distribution of the matrix among the participating processes which after receiving their share of data can execute the application in parallel. This work covers all of the seven data-parallel skeletons. Each skeleton has been tested with a small application program. In addition to measurement of performance improvement from the application program’s execution time, we have also done a communication cost analysis for all skeletons with MPI using the LogGP model. In order to evaluate and test the operational efficiency of the extension, we have considered a PDE solver application. Through this application, we have demonstrated the performance gain and scalability of the extended framework. The performance improvement was more when computational load dominates the memory I/O operations. The results show that using the extension can serve as a viable approach while implementing real-life parallel applications on large-scale clusters.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: