Easy-to-Use On-the-Fly Binary Program Acceleration on Many-Cores
Karlsruhe Institute of Technology
arXiv:1412.3906 [cs.DC], (12 Dec 2014)
This paper introduces Binary Acceleration At Runtime (BAAR), an easy-to-use on-the-fly binary acceleration mechanism which aims to tackle the problem of enabling existent software to automatically utilize accelerators at runtime. BAAR is based on the LLVM Compiler Infrastructure and has a client-server architecture. The client runs the program to be accelerated in an environment which allows program analysis and profiling. Program parts which are identified as suitable for the available accelerator are exported and sent to the server. The server optimizes these program parts for the accelerator and provides RPC execution for the client. The client transforms its program to utilize accelerated execution on the server for offloaded program parts. We evaluate our work with a proof-of-concept implementation of BAAR that uses an Intel Xeon Phi 5110P as the acceleration target and performs automatic offloading, parallelization and vectorization of suitable program parts. The practicality of BAAR for real-world examples is shown based on a study of stencil codes. Our results show a speedup of up to 4x without any developer-provided hints and 5.77x with hints over the same code compiled with the Intel Compiler at optimization level O2 and running on an Intel Xeon E5-2670 machine. Based on our insights gained during implementation and evaluation we outline future directions of research, e.g., offloading more fine-granular program parts than functions, a more sophisticated communication mechanism or introducing on-stack-replacement.
December 15, 2014 by hgpu