Understanding the Power of Evolutionary Computation for GPU Code Optimization
Arizona State University, Tempe, AZ
arXiv:2208.12350 [cs.SE], (25 Aug 2022)
@misc{https://doi.org/10.48550/arxiv.2208.12350,
doi={10.48550/ARXIV.2208.12350},
url={https://arxiv.org/abs/2208.12350},
author={Liou, Jhe-Yu and Awan, Muaaz and Hofmeyr, Steven and Forrest, Stephanie and Wu, Carole-Jean},
keywords={Software Engineering (cs.SE), Distributed, Parallel, and Cluster Computing (cs.DC), Performance (cs.PF), FOS: Computer and information sciences, FOS: Computer and information sciences},
title={Understanding the Power of Evolutionary Computation for GPU Code Optimization},
publisher={arXiv},
year={2022},
copyright={Creative Commons Attribution 4.0 International}
}
Achieving high performance for GPU codes requires developers to have significant knowledge in parallel programming and GPU architectures, and in-depth understanding of the application. This combination makes it challenging to find performance optimizations for GPU-based applications, especially in scientific computing. This paper shows that significant speedups can be achieved on two quite different scientific workloads using the tool, GEVO, to improve performance over human-optimized GPU code. GEVO uses evolutionary computation to find code edits that improve the runtime of a multiple sequence alignment kernel and a SARS-CoV-2 simulation by 28.9% and 29% respectively. Further, when GEVO begins with an early, unoptimized version of the sequence alignment program, it finds an impressive 30 times speedup — a performance improvement similar to that of the hand-tuned version. This work presents an in-depth analysis of the discovered optimizations, revealing that the primary sources of improvement vary across applications; that most of the optimizations generalize across GPU architectures; and that several of the most important optimizations involve significant code interdependencies. The results showcase the potential of automated program optimization tools to help reduce the optimization burden for scientific computing developers and enhance performance portability for domain-specific accelerators.
September 4, 2022 by hgpu