high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » High Performance Portable Tsunami Simulations on Many-core CPU, GPU, and FPGA

High Performance Portable Tsunami Simulations on Many-core CPU, GPU, and FPGA

Fumiya Kono

Department of Computer and Information Systems, The University of Aizu

The University of Aizu, 2018

DOI:10.15016/00000149

BibTeX

Download (PDF)

View

Source

2563

views

Tsunami generated by a submarine earthquake sometimes causes serious damage in a coastal area. To reduce negative effects of tsunami, effective evacuation and disaster prevention are getting interested. We can contribute to that by forecasting arrival time and height of tsunami with computer simulations. However, tsunami simulations always require massive data processing. The shallow water equations for tsunami modeling require wave height, wave speed and depth of the sea for each computation grid. The total number of computation grid also becomes over several millions. Though a sequential computation with a single-core CPU can complete tsunami simulation, technologies to complete the simulation as fast as possible are desired to reduce the damage of tsunami. In modern computer systems, various architectures for parallel computations are presented. Modern CPUs are designed as multicore systems. GPUs (Graphic Processing Units) were initially introduced to accelerate image processing. Since GPUs are also expected high performance for parallel computations, they are now applied to accelerate the general computations (GPGPU). FPGA is also attractive in regard to the compatibility of high performance computation and low power consumption. As such modern architecture appears, the parallel computing technologies such as OpenMP, OpenACC, CUDA, and OpenCL are also presented. In this dissertation, we developed various kinds of parallel codes which aim to accelerate the MOST algorithm for tsunami modeling. We conducted performance benchmarking of our parallel codes on various modern architectures such as Intel Xeon, Intel Xeon Phi, NVIDIA Tesla GPU, AMD FirePro GPU, AMD Radeon GPU, and Arria 10 FPGA. We evaluated the performance of each computation and investigated optimal implementation for the MOST algorithm. Currently, the best result is achieved by using OpenCL kernel with no optimization on AMD Radeon R9 280X GPU whose performance is 185GFlops. The computation time is 2.41 seconds for 300 time-steps which corresponds to 5 minutes in real-time. Therefore, our computation by using OpenCL and Radeon GPU is applicable to the real tsunami prediction system. The implementation of FPGA design presented in this dissertation is based on the OpenCL kernel programming. The technology which generates FPGA designs from OpenCL kernels known as High-level synthesis (HLS) is recently getting practical. We here evaluated the performance of FPGA designs generated by a compiler supported by Intel. To achieve better performance on FPGA, we optimized our GPU kernel codes for FPGA by implementing loop-unrolling so that the compiler can exploit shift registers for the computation. The performance of our FPGA design is improved by an implementation to compute multiple grid points on one pipeline. Furthermore, the methodology of HLS is even getting sophisticated in these years. We compared FPGA designs generated by two compilers in regard to performance, resource utilization, and efficiency of floating-point operations. The performance of a design by a new compiler reaches to 153Gflops which is more than twice as much as a design by an old one. Finally, we discussed the applicability of our parallel implementations to the real-time tsunami simulation based on phase velocity of the wave which derives from shallow water equations. We here used the result of the OpenCL kernel on Radeon GPU which achieved the highest performance of all combinations. We first showed scalability of our computation and calculated the computation time for updating one computation grid. Afterwards, we tested our implemented OpenCL kernel under some initial conditions referring to past earthquakes and tsunami; the earthquake near the west coastal region of America in 2005, and the 2011 Tohoku earthquake and tsunami in Japan. Estimated computation time for each situation is enough fast compared to actual arrival time of tsunami. In regard to the computation time required for numerical simulations, we can conclude that performance of our implementation is sufficient for real-time tsunami simulation.

Tags: AMD FirePro W8100, AMD FirePro W9100, AMD Radeon R9 280X, ATI, CUDA, Earth and Space Sciences, FPGA, Intel Xeon Phi, Numerical simulation, nVidia, OpenCL, Tesla K20, Thesis

December 9, 2018 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

High Performance Portable Tsunami Simulations on Many-core CPU, GPU, and FPGA

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

High Performance Portable Tsunami Simulations on Many-core CPU, GPU, and FPGA

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)