FPGA-based acceleration of a particle simulation High Performance Computing application

Aldo Conte
Politecnico di Torino
Politecnico di Torino, 2019


   title={FPGA-based acceleration of a particle simulation High Performance Computing application},

   author={Lavagno, Luciano and Conte, Aldo and Brandino, Dott Giuseppe Piero},



Download Download (PDF)   View View   Source Source   



In the present thesis, it has been studied the possibility to insert FPGAs in the world of High Performance Computing (HPC) systems. Such systems are hybrid platforms that exploit the pure parallel computation of GPUs in order to reach very high performances. Nevertheless, GPU-based systems are power-hungry and require a power consumption so large, that running and maintaining such systems could be technologically and economically too much expensive. This thesis framework is inserted within the ExaNeSt EU founded project which has the purpose to prototype energy efficient solutions to produce exascale-level supercomputers. Low power consumption requirement is tried to be satisfied using a Multiprocessor System-on-Chip, namely a system mounting on the same package both a ARM x86 processor and an Ultrascale+ FPGA: the whole module has been specifically designed with special attention to power consumption. High performance computer systems are very important in the field of computational science; in fact,this thesis investigated the possibility of using FPGA accelerators to offload the compute intensive parts of a Molecular Dynamic code. The miniMD is a simple, parallel molecular dynamics (MD) code composed of five different OpenCL kernels (neighbor_bin, neighbor_build, force_compute, integrate_initial, integrate_final) designed for studying the physical movements of atoms and molecules. In the first part of this thesis work, each kernel has been studied in order to understand which of kernel could be accelerated into the FPGA of the Multiprocessor SoC architecture. After a baselining of the full miniMD application, it has been demonstrated that the task related to the building of the neighbour particles for each molecule of the system (neighbor_build kernel) and the one related to the force computation (force_compute kernel) are the most compute intensive ones and have a prominent role into the total execution time of the application. Therefore, using the technique of High-Level-Synthesis it has been directly generated the RTL codes of each the above kernels. Specifically, for "neighbor_build" and "force_compute" kernels, different optimizations have been made inside their original OpenCL codes in order to accelerate their execution onto the FPGA. Moreover, it has been possible to notice that the most important optimizations could be performed in the downloading and uploading processes of the data handled by the kernels, directly into the global memory of the FPGA. These optimizations were meant to promote the burst memory transactions and to exploit efficiently the low bandwidth between the External DDR memory and the Programmable Logic (PL) therefore reducing the access time of the external memory. It is important to notice that in the aforementioned optimizations, loops have not been so easy to be speeded up due to their not perfect bounded nature. At the end, all the kernels have been merged and a single Vivado design of the miniMD application has been obtained to be later runned onto the Zynq Ultrascale+ device.
Rating: 2.0/5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2019 hgpu.org

All rights belong to the respective authors

Contact us: