Hydrodynamic Computation with Hybrid Programming on CPU-GPU Clusters

hgpu.org » Programming » CUDA » Hydrodynamic Computation with Hybrid Programming on CPU-GPU Clusters

Hydrodynamic Computation with Hybrid Programming on CPU-GPU Clusters

Tingxing Dong, Veselin Dobrev, Tzanio Kolev, Robert Rieben, Stanimire Tomov, Jack Dongarra

Innovative Computing Laboratory, University of Tennessee, Knoxville, Lawrence Livermore National Laboratory

University of Tennessee, 2013

@article{dong2013hydrodynamic,

title={Hydrodynamic Computation with Hybrid Programming on CPU-GPU Clusters},

author={Dong, Tingxing and Dobrev, Veselin and Kolev, Tzanio and Rieben, Robert and Tomov, Stanimire and Dongarra, Jack},

year={2013}

}

Download (PDF)

View

Source

1664

views

The explosion of parallelism and heterogeneity in today’s computer architectures has created opportunities as well as challenges for redesigning legacy numerical software to harness the power of new hardware. In this paper we address the main challenges in redesigning BLAST { a numerical library that solves the equations of compressible hydrodynamics using high order finite element methods (FEM) in a moving Lagrangian frame { to support CPU-GPU clusters. We use a hybrid MPI + OpenMP + CUDA programming model that includes two layers: domain decomposed MPI parallelization and OpenMP + CUDA acceleration in a given domain. To optimize the code, we implemented custom linear algebra kernels and introduced an auto-tuning technique to deal with heterogeneity and load balancing at runtime. Our tests show that 12 Intel Xeon cores and two M2050 GPUs deliver a 24x speedup compared to a single core, and a 2:5x speedup compared to 12 MPI tasks in one node. Further, we achieve perfect weak scaling, demonstrated on a cluster with up to 64 GPUs in 32 nodes. Our choice of programming model and proposed solutions, as related to parallelism and load balancing, specifically targets high order FEM discretizations, and can be used equally successfully for applications beyond hydrodynamics. A major accomplishment is that we further establish the appeal of high order FEMs, which despite their better approximation properties, are often avoided due to their high computational cost. GPUs, as we show, have the potential to make them the method of choice, as the increased computational cost is also localized, e.g., cast as Level 3 BLAS, and thus can be done very efficiently (close to free" relative to the usual overheads inherent in sparse computations).

Tags: CUDA, FEM, Finite element method, Fluid dynamics, GPU cluster, Linear Algebra, MPI, nVidia, Tesla M2050, Tesla M2090

July 15, 2013 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org