high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » GPU performance analysis of a nodal discontinuous Galerkin method for acoustic and elastic models

GPU performance analysis of a nodal discontinuous Galerkin method for acoustic and elastic models

Axel Modave, Amik St-Cyr, Tim Warburton

Virginia Polytechnic Institute and State University, Blacksburg, Virginia, USA

arXiv:1602.07997 [physics.comp-ph], (25 Feb 2016)

@article{modave2016performance,

title={GPU performance analysis of a nodal discontinuous Galerkin method for acoustic and elastic models},

author={Modave, Axel and St-Cyr, Amik and Warburton, Tim},

year={2016},

month={feb},

archivePrefix={"arXiv"},

primaryClass={physics.comp-ph}

}

Download (PDF)

View

Source

2719

views

Finite element schemes based on discontinuous Galerkin methods possess features amenable to massively parallel computing accelerated with general purpose graphics processing units (GPUs). However, the computational performance of such schemes strongly depends on their implementation. In the past, several implementation strategies have been proposed. They are based exclusively on specialized compute kernels tuned for each operation, or they can leverage BLAS libraries that provide optimized routines for basic linear algebra operations. In this paper, we present and analyze up-to-date performance results for different implementations, tested in a unified framework on a single NVIDIA GTX980 GPU. We show that specialized kernels written with a one-node-per-thread strategy are competitive for polynomial bases up to the fifth and seventh degrees for acoustic and elastic models, respectively. For higher degrees, a strategy that makes use of the NVIDIA cuBLAS library provides better results, able to reach a net arithmetic throughput 35.7% of the theoretical peak value.

Tags: BLAS, Computational Physics, CUBLAS, CUDA, FEM, Finite element method, Geoscience, Linear Algebra, nVidia, nVidia GeForce GTX 980, OCCA, Physics, Profiling, Seismic modeling, Seismology

March 1, 2016 by hgpu

Rating: 2.5/5. From 3 votes.

Please wait...

Your response

You must be logged in to post a comment.

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org

GPU performance analysis of a nodal discontinuous Galerkin method for acoustic and elastic models

Your response

Recent source codes

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

CL4SE: A Context Learning Benchmark For Software Engineering Tasks

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Most viewed papers (last 30 days)

GPU performance analysis of a nodal discontinuous Galerkin method for acoustic and elastic models

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)