25601

Measurement and Analysis of GPU-accelerated Applications with HPCToolkit

Keren Zhou, Laksono Adhianto, Jonathon Anderson, Aaron Cherian, Dejan Grubisic, Mark Krentel, Yumeng Liu, Xiaozhu Meng, John Mellor-Crummey
Department of Computer Science, Rice University, Houston, TX
arXiv:2109.06931 [cs.DC], (14 Sep 2021)

@article{Zhou_2021,

   title={Measurement and analysis of GPU-accelerated applications with HPCToolkit},

   volume={108},

   ISSN={0167-8191},

   url={http://dx.doi.org/10.1016/j.parco.2021.102837},

   DOI={10.1016/j.parco.2021.102837},

   journal={Parallel Computing},

   publisher={Elsevier BV},

   author={Zhou, Keren and Adhianto, Laksono and Anderson, Jonathon and Cherian, Aaron and Grubisic, Dejan and Krentel, Mark and Liu, Yumeng and Meng, Xiaozhu and Mellor-Crummey, John},

   year={2021},

   month={Dec},

   pages={102837}

}

Download Download (PDF)   View View   Source Source   Source codes Source codes

Package:

942

views

To address the challenge of performance analysis on the US DOE’s forthcoming exascale supercomputers, Rice University has been extending its HPCToolkit performance tools to support measurement and analysis of GPU-accelerated applications. To help developers understand the performance of accelerated applications as a whole, HPCToolkit’s measurement and analysis tools attribute metrics to calling contexts that span both CPUs and GPUs. To measure GPU-accelerated applications efficiently, HPCToolkit employs a novel wait-free data structure to coordinate monitoring and attribution of GPU performance. To help developers understand the performance of complex GPU code generated from high-level programming models, HPCToolkit constructs sophisticated approximations of call path profiles for GPU computations. To support fine-grained analysis and tuning, HPCToolkit uses PC sampling and instrumentation to measure and attribute GPU performance metrics to source lines, loops, and inlined code. To supplement fine-grained measurements, HPCToolkit can measure GPU kernel executions using hardware performance counters. To provide a view of how an execution evolves over time, HPCToolkit can collect, analyze, and visualize call path traces within and across nodes. Finally, on NVIDIA GPUs, HPCToolkit can derive and attribute a collection of useful performance metrics based on measurements using GPU PC samples. We illustrate HPCToolkit’s new capabilities for analyzing GPU-accelerated applications with several codes developed as part of the Exascale Computing Project.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: