Speedup and Parallelization Models for Energy-Efficient Many-Core Systems Using Performance Counters

hgpu.org » Applications » Computer science » Speedup and Parallelization Models for Energy-Efficient Many-Core Systems Using Performance Counters

Speedup and Parallelization Models for Energy-Efficient Many-Core Systems Using Performance Counters

Mohammed A. Noaman Al-hayanni, Rishad Shafik, Ashur Rafiev, Fei Xia, Alex Yakovlev

Technical Report Series NCL-EEE-MICRO-TR-2017-205, 2017

@article{al2017speedup,

title={Speedup and Parallelization Models for Energy-Efficient Many-Core Systems Using Performance Counters},

author={Al-hayanni, Mohammed A Noaman and Shafik, Rishad and Rafiev, Ashur and Xia, Fei and Yakovlev, Alex},

year={2017}

}

Download (PDF)

View

Source

2291

views

Traditional speedup models, such as Amdahl’s, facilitate the study of the impact of running parallel workloads on manycore systems. However, these models are typically based on software characteristics, assuming ideal hardware behaviors. As such, the applicability of these models for energy and/or performance-driven system optimization is limited by two factors. Firstly, speedup cannot be measured without instrumenting the original software codes, and secondly, the parallelization factor of an application running on specific hardware is generally unknown. In this paper, we propose a novel method, whereby standard performance counters found in modern manycore platforms can be used to derive speedup without instrumenting applications for time measurements. We postulate that speedup can be accurately estimated as a ratio of instructions per cycle for a parallel manycore system to the instructions per cycle of a single core system. By studying the application instructions and system instructions for the first time, our method leads to the determination of the parallelization factor and the optimal system configuration for energy and/or performance. The method is extensively demonstrated through experiments on three different platforms with core numbers ranging from 4 to 61, running parallel benchmark applications (including synthetic and PARSEC benchmarks) on Linux operating system. Speedup and parallelization estimations using our method and their extensive cross-validations show negligible errors (up to 8%) in these systems. Additionally, we demonstrate the effectiveness of our method to explore parallelization-aware energy-efficient system configurations for many-core systems using energy-delay-product based formulations.

Tags: Computer science, Energy-efficient computing, Intel Xeon Phi, Performance

June 5, 2017 by hgpu

Rating: 1.5/5. From 3 votes.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org