ANGHABENCH: a Suite with One Million Compilable C Benchmarks for Code-Size Reduction
Department of Informatics, UEM, Brazil
@article{daangh2020abench,
title={ANGHABENCH: a Suite with One Million Compilable C Benchmarks for Code-Size Reduction},
author={da Silva, Anderson Faustino and Kind, Bruno Conde and de Souza Magalhaes, Jos{‘e} Wesley and Rocha, Jer{^o}nimo Nunes and Guimaraes, Breno Campos Ferreira and Pereira, Fernando Magno Quintao},
year={2020}
}
A predictive compiler uses properties of a program to decide how to optimize it. The compiler is trained on a collection of programs to derive a model which determines its actions in face of unknown codes. One of the challenges of predictive compilation is how to find good training sets. Regardless of the programming language, the availability of humanmade benchmarks is limited. Moreover, current synthesizers produce code that is very different from actual programs, and mining compilable code from open repositories is difficult, due to program dependencies. In this paper, we use a combination of web crawling and type inference to overcome these problems for the C programming language. We use a type reconstructor based on Hindley-Milner’s algorithm to produce ANGHABENCH, a virtually unlimited collection of real-world compilable C programs. Although ANGHABENCH programs are not executable, they can be transformed into object files by any C compliant compiler. Therefore, they can be used to train compilers for code size reduction. We have used thousands of ANGHABENCH programs to train YACOS, a predictive compiler based on LLVM. The version of YACOS autotuned with ANGHABENCH generates binaries for the LLVM test suite over 10% smaller than clang -Oz. It compresses code impervious even to the state-of-the-art Function Sequence Alignment technique published in 2019, as it does not require large binaries to work well.
December 20, 2020 by hgpu