29155

Cost-Effective Methodology for Complex Tuning Searches in HPC: Navigating Interdependencies and Dimensionality

Adrian Perez Dieguez, Min Choi, Mahmut Okyay, Mauro Del Ben, Bryan M. Wong, Khaled Z. Ibrahim
Lawrence Berkeley National Laboratory, CA, USA
arXiv:2403.08131 [cs.DC], (12 Mar 2024)

@misc{dieguez2024costeffective,

   title={Cost-Effective Methodology for Complex Tuning Searches in HPC: Navigating Interdependencies and Dimensionality},

   author={Adrian Perez Dieguez and Min Choi and Mahmut Okyay and Mauro Del Ben and Bryan M. Wong and Khaled Z. Ibrahim},

   year={2024},

   eprint={2403.08131},

   archivePrefix={arXiv},

   primaryClass={cs.DC}

}

Download Download (PDF)   View View   Source Source   

568

views

Tuning searches are pivotal in High-Performance Computing (HPC), addressing complex optimization challenges in computational applications. The complexity arises not only from finely tuning parameters within routines but also potential interdependencies among them, rendering traditional optimization methods inefficient. Instead of scrutinizing interdependencies among parameters and routines, practitioners often face the dilemma of conducting independent tuning searches for each routine, thereby overlooking interdependence, or pursuing a more resource-intensive joint search for all routines. This decision is driven by the consideration that some interdependence analysis and high-dimensional decomposition techniques in literature may be prohibitively expensive in HPC tuning searches. Our methodology adapts and refines these methods to ensure computational feasibility while maximizing performance gains in real-world scenarios. Our methodology leverages a cost-effective interdependence analysis to decide whether to merge several tuning searches into a joint search or conduct orthogonal searches. Tested on synthetic functions with varying levels of parameter interdependence, our methodology efficiently explores the search space. In comparison to Bayesian-optimization-based full independent or fully joint searches, our methodology suggested an optimized breakdown of independent and merged searches that led to final configurations up to 8% more accurate, reducing the search time by up to 95%. When applied to GPU-offloaded Real-Time Time-Dependent Density Functional Theory (RT-TDDFT), an application in computational materials science that challenges modern HPC autotuners, our methodology achieved an effective tuning search. Its adaptability and efficiency extend beyond RT-TDDFT, making it valuable for related applications in HPC.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: