CFD code adaptation to the FPGA architecture
Czestochowa University of Technology, Czestochowa, Poland
The International Journal of High Performance Computing Applications, Vol. 35(1) 33–46, 2021
@article{rojek2021cfd,
title={CFD code adaptation to the FPGA architecture},
author={Rojek, Krzysztof and Halbiniak, Kamil and Kuczynski, Lukasz},
journal={The International Journal of High Performance Computing Applications},
volume={35},
number={1},
pages={33–46},
year={2021},
publisher={SAGE Publications Sage UK: London, England}
}
For the last years, we observe the intensive development of accelerated computing platforms. Although current trends indicate a well-established position of GPU devices in the HPC environment, FPGA (Field-Programmable Gate Array) aspires to be an alternative solution to offload the CPU computation. This paper presents a systematic adaptation of four various CFD (Computational Fluids Dynamic) kernels to the Xilinx Alveo U250 FPGA. The goal of this paper is to investigate the potential of the FPGA architecture as the future infrastructure able to provide the most complex numerical simulations in the area of fluid flow modeling. The selected kernels are customized to a real-scientific scenario, compatible with the EULAG (Eulerian/semi-Lagrangian) fluid solver. The solver is used to simulate thermo-fluid flows across a wide range of scales and is extensively used in numerical weather prediction. The proposed adaptation is focused on the analysis of the strengths and weaknesses of the FPGA accelerator, considering performance and energy efficiency. The proposed adaptation is compared with a CPU implementation that was strongly optimized to provide realistic and objective benchmarks. The performance results are compared with a set of server CPUs containing various Intel generations, including Intel SkyLake-based CPUs as Xeon Gold 6148 and Xeon Platinum 8168, as well as Intel Xeon E5-2695 CPU based on the IvyBridge architecture. Since all the kernels belong to the group of memory-bound algorithms, our main challenge is to saturate global memory bandwidth and provide data locality with the intensive BRAM (Block RAM) reusing. Our adaptation allows us to reduce the performance per watt up to 80% compared to the CPUs.
January 17, 2021 by hgpu