high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Finite element assembly strategies on multi-and many-core architectures

Finite element assembly strategies on multi-and many-core architectures

G. R. Markall, A. Slemmer, D. A .Ham, P. H. J. Kelly, C. D. Cantwell, S. J. Sherwin

Department of Computing, Imperial College London

International Journal for Numerical Methods in Fluids, 2011

BibTeX

Download (PDF)

View

Source

2047

views

We demonstrate that radically differing implementations of finite element methods are needed on multicore (CPU) and many-core (GPU) architectures, if their respective performance potential is to be realised. Our experimental investigations using a finite element advection-diffusion solver show that increased performance on each architecture can only be achieved by committing to specific and diverse algorithmic choices that cut across the high-level structure of the implementation. Making these commitments to achieve high performance for a single architecture leads to a loss of performance portability. Data structures that include redundant data but enable coalesced memory accesses are faster on many-core architectures, whereas redundancy-free data structures that are accessed indirectly are faster on multi-core architectures. The Addto algorithm for global assembly is optimal on multi-core architectures, whereas the Local Matrix Approach is optimal on many-core architectures despite requiring more computation than the Addto algorithm. These results demonstrate the value in making the correct choice of algorithm and data structure when implementing finite element methods, spectral element methods and low-order discontinuous Galerkin methods on modern high-performance architectures.

Tags: Algorithms, ATI, ATI Radeon HD 5870, CUDA, FEM, Finite element method, Fluid dynamics, nVidia, nVidia GeForce GTX 280, nVidia GeForce GTX 480, OpenCL, Spectral elements

October 4, 2011 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Finite element assembly strategies on multi-and many-core architectures

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)

Finite element assembly strategies on multi-and many-core architectures

Share this:

Recent source codes

Most viewed papers (last 30 days)