high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Model-based optimization of MPDATA on Intel Xeon Phi through load imbalancing

Model-based optimization of MPDATA on Intel Xeon Phi through load imbalancing

Alexey Lastovetsky, Lukasz Szustak, Roman Wyrzykowski

University College Dublin, Belfield, Dublin 4, Irleand

arXiv:1507.01265 [cs.DC], (5 Jul 2015)

@article{lastovetsky2015modelbased,

title={Model-based optimization of MPDATA on Intel Xeon Phi through load imbalancing},

author={Lastovetsky, Alexey and Szustak, Lukasz and Wyrzykowski, Roman},

year={2015},

month={jul},

archivePrefix={"arXiv"},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

1062

views

Load balancing is a widely accepted technique for performance optimization of scientific applications on parallel architectures. Indeed, balanced applications do not waste processor cycles on waiting at points of synchronization and data exchange, maximizing this way the utilization of processors. In this paper, we challenge the universality of the load-balancing approach to optimization of the performance of parallel applications. First, we formulate conditions that should be satisfied by the performance profile of an application in order for the application to achieve its best performance via load balancing. Then we use a real-life scientific application, MPDATA, to demonstrate that its performance profile on a modern parallel architecture, Intel Xeon Phi, significantly deviates from these conditions. Based on this observation, we propose a method of performance optimization of scientific applications through load imbalancing. We also propose an algorithm that finds the optimal, possibly imbalanced, configuration of a data parallel application on a set of homogeneous processors. This algorithm uses functional performance models of the application to find the partitioning that minimizes its computation time but not necessarily balances the load of the processors. We show how to apply this algorithm to optimization of MPDATA on Intel Xeon Phi. Experimental results demonstrate that the performance of this carefully optimized load-balanced application can be further improved by 15% using the proposed load-imbalancing optimization.

Tags: Computer science, Intel Xeon Phi, Performance

July 8, 2015 by hgpu

Rating: 0.5/5. From 1 vote.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Model-based optimization of MPDATA on Intel Xeon Phi through load imbalancing

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Model-based optimization of MPDATA on Intel Xeon Phi through load imbalancing

Share this:

Recent source codes

Most viewed papers (last 30 days)