high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Finite Element Integration with Quadrature on the GPU

Finite Element Integration with Quadrature on the GPU

Matthew G. Knepley, Karl Rupp, Andy R. Terrel

Department of Computational and Applied Mathematics, Rice University, Houston, TX

arXiv:1607.04245 [cs.MS], (14 Jul 2016)

BibTeX

Download (PDF)

View

Source

2049

views

We present a novel, quadrature-based finite element integration method for low-order elements on GPUs, using a pattern we call thread transposition to avoid reductions while vectorizing aggressively. On the NVIDIA GTX580, which has a nominal single precision peak flop rate of 1.5 TF/s and a memory bandwidth of 192 GB/s, we achieve close to 300 GF/s for element integration on first-order discretization of the Laplacian operator with variable coefficients in two dimensions, and over 400 GF/s in three dimensions. From our performance model we find that this corresponds to 90% of our measured achievable bandwidth peak of 310 GF/s. Further experimental results also match the predicted performance when used with double precision (120 GF/s in two dimensions, 150 GF/s in three dimensions). Results obtained for the linear elasticity equations (220 GF/s and 70 GF/s in two dimensions, 180 GF/s and 60 GF/s in three dimensions) also demonstrate the applicability of our method to vector-valued partial differential equations.

Tags: AMD FirePro W9100, ATI, Computer science, Differential equations, Mathematical Software, nVidia, nVidia GeForce GTX 580, nVidia GeForce GTX 750 Ti, OpenCL, Partial differential equations, PDEs, Tesla K20

July 16, 2016 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Finite Element Integration with Quadrature on the GPU

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Finite Element Integration with Quadrature on the GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)