high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » OpenCL » FPGA-based acceleration of a particle simulation High Performance Computing application

FPGA-based acceleration of a particle simulation High Performance Computing application

Aldo Conte

Politecnico di Torino

Politecnico di Torino, 2019

@article{lavagno2019fpga,

title={FPGA-based acceleration of a particle simulation High Performance Computing application},

author={Lavagno, Luciano and Conte, Aldo and Brandino, Dott Giuseppe Piero},

year={2019}

}

Download (PDF)

View

Source

2476

views

In the present thesis, it has been studied the possibility to insert FPGAs in the world of High Performance Computing (HPC) systems. Such systems are hybrid platforms that exploit the pure parallel computation of GPUs in order to reach very high performances. Nevertheless, GPU-based systems are power-hungry and require a power consumption so large, that running and maintaining such systems could be technologically and economically too much expensive. This thesis framework is inserted within the ExaNeSt EU founded project which has the purpose to prototype energy efficient solutions to produce exascale-level supercomputers. Low power consumption requirement is tried to be satisfied using a Multiprocessor System-on-Chip, namely a system mounting on the same package both a ARM x86 processor and an Ultrascale+ FPGA: the whole module has been specifically designed with special attention to power consumption. High performance computer systems are very important in the field of computational science; in fact,this thesis investigated the possibility of using FPGA accelerators to offload the compute intensive parts of a Molecular Dynamic code. The miniMD is a simple, parallel molecular dynamics (MD) code composed of five different OpenCL kernels (neighbor_bin, neighbor_build, force_compute, integrate_initial, integrate_final) designed for studying the physical movements of atoms and molecules. In the first part of this thesis work, each kernel has been studied in order to understand which of kernel could be accelerated into the FPGA of the Multiprocessor SoC architecture. After a baselining of the full miniMD application, it has been demonstrated that the task related to the building of the neighbour particles for each molecule of the system (neighbor_build kernel) and the one related to the force computation (force_compute kernel) are the most compute intensive ones and have a prominent role into the total execution time of the application. Therefore, using the technique of High-Level-Synthesis it has been directly generated the RTL codes of each the above kernels. Specifically, for "neighbor_build" and "force_compute" kernels, different optimizations have been made inside their original OpenCL codes in order to accelerate their execution onto the FPGA. Moreover, it has been possible to notice that the most important optimizations could be performed in the downloading and uploading processes of the data handled by the kernels, directly into the global memory of the FPGA. These optimizations were meant to promote the burst memory transactions and to exploit efficiently the low bandwidth between the External DDR memory and the Programmable Logic (PL) therefore reducing the access time of the external memory. It is important to notice that in the aforementioned optimizations, loops have not been so easy to be speeded up due to their not perfect bounded nature. At the end, all the kernels have been merged and a single Vivado design of the miniMD application has been obtained to be later runned onto the Zynq Ultrascale+ device.

Tags: FPGA, Molecular dynamics, OpenCL, Particle simulation, Physics, Thesis

May 8, 2019 by hgpu

Rating: 2.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

FPGA-based acceleration of a particle simulation High Performance Computing application

Your response

Recent source codes

RepoLaunch: Automating Build and Test Pipeline of Code Repositories on ANY Language and ANY Platform

RepoLaunch: Automating Build and Test Pipeline of Code Repositories on ANY Language and ANY Platform

CONCUR: a benchmark designed to evaluate multithreaded Java code generated by LLMs

HIPRT: Ray Tracing using HIP

MXFP4 Training Support Codebase

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

CL4SE: A Context Learning Benchmark For Software Engineering Tasks

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

Most viewed papers (last 30 days)

FPGA-based acceleration of a particle simulation High Performance Computing application

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)