19074

Visual Performance Analysis of Memory Behavior in a Task-Based Runtime on Hybrid Platforms

Lucas Leandro Nesi, Samuel Thibault, Luka Stanisic, Lucas Mello Schnorr
Institute of Informatics/PPGC/UFRGS, Porto Alegre, Brazil
hal-02275363, (31 August 2019)

@inproceedings{leandronesi:hal-02275363,

   title={Visual Performance Analysis of Memory Behavior in a Task-Based Runtime on Hybrid Platforms},

   author={Leandro Nesi, Lucas and Thibault, Samuel and Stanisic, Luka and Mello Schnorr, Lucas},

   url={https://hal.inria.fr/hal-02275363},

   booktitle={2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)},

   address={Larnaca, Cyprus},

   publisher={IEEE},

   pages={142-151},

   year={2019},

   month={May},

   doi={10.1109/CCGRID.2019.00025},

   pdf={https://hal.inria.fr/hal-02275363/file/CCGRID_camera_ready.pdf},

   hal_id={hal-02275363},

   hal_version={v1}

}

Download Download (PDF)   View View   Source Source   

288

views

Programming parallel applications for heterogeneous HPC platforms is much more straightforward when using the task-based programming paradigm. The simplicity exists because a runtime takes care of many activities usually carried out by the application developer, such as task mapping, load balancing, and memory management operations. In this paper, we present a visualization-based performance analysis methodology to investigate the CPU-GPU-Disk memory management of the StarPU runtime, a popular task-based middleware for HPC applications. We detail the design of novel graphical strategies that were fundamental to recognize performance problems in four study cases. We first identify poor management of data handles when GPU memory is saturated, leading to low application performance. Our experiments using the dense tiled-based Cholesky factorization show that our fix leads to performance gains of 66% and better scalability for larger input sizes. In the other three cases, we study scenarios where the main memory is insufficient to store all the application’s data, forcing the runtime to store data out-of-core. Using our methodology, we pin-point different behavior among schedulers and how we have identified a crucial problem in the application code regarding initial block placement, which leads to poor performance.
Rating: 5.0/5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2019 hgpu.org

All rights belong to the respective authors

Contact us: