SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy Saving
Technische Universität Berlin, Germany
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2023
@article{fan2023synergy,
title={SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy Saving},
author={Fan, Kaijie and D’Antonio, Marco and Carpentieri, Lorenzo and Cosenza, Biagio and Ficarelli, Federico and Cesarini, Daniele},
journal={Memory},
volume={900},
pages={1000},
year={2023}
}
Energy-efficient computing uses power management techniques such as frequency scaling to save energy. Implementing energy-efficient techniques on large-scale computing systems is challenging for several reasons. While most modern architectures, including GPUs, are capable of frequency scaling, these features are often not available on large systems. In addition, achieving higher energy savings requires precise energy tuning because not only applications but also different kernels can have different energy characteristics. We propose SYnergy, a novel energy-efficient approach that spans languages, compilers, runtimes, and job schedulers to achieve unprecedented fine-grained energy savings on large-scale heterogeneous clusters. SYnergy defines an extension to the SYCL programming model that allows programmers to define a specific energy goal for each kernel. For example, a kernel can aim to minimize well-known energy metrics such as EDP and ED2P or to achieve predefined energy-performance tradeoffs, such as the best performance with 25% energy savings. Through compiler integration and a machine learning model, each kernel is statically optimized for the specific target. On large computing systems, a SLURM plugin allows SYnergy to run on all available devices in the cluster, providing scalable energy savings. The methodology is inherently portable and has been evaluated on both NVIDIA and AMD GPUs. Experimental results show unprecedented improvements in energy and energy-related metrics on real-world applications, as well as scalable energy savings on a 64-GPU cluster.
August 13, 2023 by hgpu