Extension of the SkePU Skeleton Programming Framework for Multi-core CPU and Multi-GPU Systems for MPI-based Clusters

hgpu.org » Applications » Computer science » Extension of the SkePU Skeleton Programming Framework for Multi-core CPU and Multi-GPU Systems for MPI-based Clusters

Extension of the SkePU Skeleton Programming Framework for Multi-core CPU and Multi-GPU Systems for MPI-based Clusters

Swadhin K. Mangaraj

Department of Computer and Information Science, Software and Systems, Linkoping University

Linkoping University, 2013

@thesis{mangaraj2013extension,

title={Extension of the SkePU Skeleton Programming Framework for Multi-core CPU and Multi-GPU Systems for MPI-based Clusters},

author={Mangaraj, Swadhin K.},

year={2013}

}

Download (PDF)

View

Source

2365

views

SkePU (Skeleton Programming Framework for Multi-core CPU and Multi-GPU Systems) is a parallel computing framework developed by Johan Enmyren and Christoph Kessler at Linkopings Universitet. This C++ template library provides a simple and unified interface for specifying data-parallel computations with the help of skeletons and is targeted to multiple backends e.g. for a sequential CPU, parallel CPUs using MPI and OpenMP or GPUs using CUDA and OpenCL. SkePU is comprised of seven data-parallel skeletons and one task-parallel skeleton and these skeletons use two types of containers: vector and matrix to model real-life parallel applications. In this thesis, we address the extension of the SkePU framework by extending the matrix container (which stores 2-D data values) that can efficiently use the existing skeletons to develop parallel scientific applications on large-scale clusters using MPI. This piece of work focuses on the distribution of the matrix among the participating processes which after receiving their share of data can execute the application in parallel. This work covers all of the seven data-parallel skeletons. Each skeleton has been tested with a small application program. In addition to measurement of performance improvement from the application program’s execution time, we have also done a communication cost analysis for all skeletons with MPI using the LogGP model. In order to evaluate and test the operational efficiency of the extension, we have considered a PDE solver application. Through this application, we have demonstrated the performance gain and scalability of the extended framework. The performance improvement was more when computational load dominates the memory I/O operations. The results show that using the extension can serve as a viable approach while implementing real-life parallel applications on large-scale clusters.

Tags: Computer science, CUDA, GPU cluster, High-level Languages, MPI, nVidia, OpenCL, Thesis

October 29, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org