19656

Automated Runtime Analysis and Adaptation for Scalable Heterogeneous Computing

Ahmed E. Helal
Virginia Polytechnic Institute and State University
Virginia Tech, 2020

@phdthesis{helal2020automated,

   title={Automated Runtime Analysis and Adaptation for Scalable Heterogeneous Computing},

   author={Helal, Ahmed Elmohamadi Mohamed},

   year={2020},

   school={Virginia Tech}

}

Download Download (PDF)   View View   Source Source   Source codes Source codes

Package:

455

views

In the last decade, there have been tectonic shifts in computer hardware because of reaching the physical limits of the sequential CPU performance. As a consequence, current high-performance computing (HPC) systems integrate a wide variety of compute resources with different capabilities and execution models, ranging from multi-core CPUs to many-core accelerators. While such heterogeneous systems can enable dramatic acceleration of user applications, extracting optimal performance via manual analysis and optimization is a complicated and time-consuming process. This dissertation presents graph-structured program representations to reason about the performance bottlenecks on modern HPC systems and to guide novel automation frameworks for performance analysis and modeling and runtime adaptation. The proposed program representations exploit domain knowledge and capture the inherent computation and communication patterns in user applications, at multiple levels of computational granularity, via compiler analysis and dynamic instrumentation. The empirical results demonstrate that the introduced modeling frameworks accurately estimate the realizable parallel performance and scalability of a given sequential code when ported to heterogeneous HPC systems. As a result, these frameworks enable efficient workload distribution schemes that utilize all the available compute resources in a performance-proportional way. In addition, the proposed runtime adaptation frameworks significantly improve the end-to-end performance of important real-world applications which suffer from limited parallelism and fine-grained data dependencies. Specifically, compared to the state-of-the-art methods, such an adaptive parallel execution achieves up to an order-of-magnitude speedup on the target HPC systems while preserving the inherent data dependencies of user applications.
Rating: 3.5/5. From 2 votes.
Please wait...

* * *

* * *

HGPU group © 2010-2020 hgpu.org

All rights belong to the respective authors

Contact us: