Performance and Portability of Accelerated Lattice Boltzmann Applications with OpenACC

hgpu.org » Programming » CUDA » Performance and Portability of Accelerated Lattice Boltzmann Applications with OpenACC

Performance and Portability of Accelerated Lattice Boltzmann Applications with OpenACC

E. Calore, A. Gabbana, J. Kraus, S. F. Schifano, R. Tripiccione

Dip. di Fisica e Scienze della Terra, University of Ferrara, and INFN, Ferrara (Italy)

arXiv:1703.00186 [cs.DC], (1 Mar 2017)

DOI:10.1002/cpe.3862

BibTeX

Download (PDF)

View

Source

2513

views

An increasingly large number of HPC systems rely on heterogeneous architectures combining traditional multi-core CPUs with power efficient accelerators. Designing efficient applications for these systems has been troublesome in the past as accelerators could usually be programmed using specific programming languages threatening maintainability, portability and correctness. Several new programming environments try to tackle this problem. Among them, OpenACC offers a high-level approach based on compiler directive clauses to mark regions of existing C, C++ or Fortran codes to run on accelerators. This approach directly addresses code portability, leaving to compilers the support of each different accelerator, but one has to carefully assess the relative costs of portable approaches versus computing efficiency. In this paper we address precisely this issue, using as a test-bench a massively parallel Lattice Boltzmann algorithm. We first describe our multi-node implementation and optimization of the algorithm, using OpenACC and MPI. We then benchmark the code on a variety of processors, including traditional CPUs and GPUs, and make accurate performance comparisons with other GPU implementations of the same algorithm using CUDA and OpenCL. We also asses the performance impact associated to portable programming, and the actual portability and performance-portability of OpenACC-based applications across several state-of-the- art architectures.

Tags: AMD FirePro S9150, ATI, Benchmarking, CUDA, Fluid dynamics, Lattice Boltzmann model, MPI, nVidia, OpenACC, OpenCL, Performance, Tesla K80

March 5, 2017 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Performance and Portability of Accelerated Lattice Boltzmann Applications with OpenACC

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Performance and Portability of Accelerated Lattice Boltzmann Applications with OpenACC

Share this:

Recent source codes

Most viewed papers (last 30 days)