high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » FDTD on Distributed Heterogeneous Multi-GPU Systems

FDTD on Distributed Heterogeneous Multi-GPU Systems

Eirik Myklebost

Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Computer and Information Science

Norwegian University of Science and Technology, 2014

@article{myklebost2014fdtd,

title={FDTD on Distributed Heterogeneous Multi-GPU Systems},

author={Myklebost, Eirik},

year={2014},

publisher={Institutt for datateknikk og informasjonsvitenskap}

}

Download (PDF)

View

Source

3438

views

Finite-Difference Time-Domain (FDTD) is a popular technique for modeling computational electrodynamics, and is used within many research areas, such as the development of antennas, ultrasound imaging, and seismic wave propagation. Simulating large domains can however be very compute and memory demanding, which has motivated the use of cluster computing, and lately also the use of Graphical Processing Units (GPUs).The previous work of Andreas Berg Skomedal’s master thesis from May 2013 includes a heterogeneous FDTD implementation, in the sense that it schedules domains between a CPU and a GPU on a single system. The implementation is a benchmarking code based on the Yee_bench code by Ulf Andersson, and focuses on the performance of simulating many small individual FDTD domains.This thesis introduces a new FDTD implementation based on the work by Skomedal and Andersson. The code is written in C++ and CUDA, and uses a decomposition approach as opposed to scheduling, which allows for larger domains to be divided among multiple execution units. It supports the use of both a CPU and several CUDA capable GPUs on a single system, in addition to multi-node execution through the use of the Message Passing Interface (MPI). A discussion of the differences between the CUDA capable GPU architectures, and how they affect the performance of the FDTD algorithm, is also included.The results shows a performance increase of 66% when simulating large domains on two GPUs compared to a single GPU. Using the CPU in addition to one or two fast GPUs is shown to give a slight improvement, but the main advantage is the possibility to simulate larger domains. Results from multi-node executions is also included, but they refer to poor performance values, due to being severely limited by a 100 Mbit/s Ethernet.The work of this thesis includes a working FDTD decomposition implementation, that can be executed on a cluster of heterogeneous systems with a multi-core CPU, and one or several CUDA capable GPUs. It is also written with the intention that it should be easily extendable to also work with non-CUDA capable GPUs. As with the previous work by Skomedal and Andersson, this implementation is only a benchmarking code, and is not suited for real world problems. It is instead intended to be used as a basis for future works, or as an example on how to do FDTD on a cluster of heterogeneous multi-GPU systems.

Tags: Algorithms, Benchmarking, Cluster computing, CUDA, Electrodynamics, FDTD, Finite-difference time-domain, Heterogeneous systems, MPI, nVidia, nVidia GeForce GTX 280, nVidia GeForce GTX 470, nVidia GeForce GTX 760, Physics, Tesla K20, Thesis, Ultrasound

October 10, 2014 by hgpu

Rating: 3.3/5. From 3 votes.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

FDTD on Distributed Heterogeneous Multi-GPU Systems

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

FDTD on Distributed Heterogeneous Multi-GPU Systems

Share this:

Recent source codes

Most viewed papers (last 30 days)