high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Parallel Implementation of the Finite Element Method on Graphics Processors for the Solution of Incompressible Flows

Parallel Implementation of the Finite Element Method on Graphics Processors for the Solution of Incompressible Flows

Mahmut Murat Gocmen

Middle East Technical University

Middle East Technical University, 2014

@phdthesis{goccmen2014parallel,

title={PARALLEL IMPLEMENTATION OF THE FINITE ELEMENT METHOD ON GRAPHICS PROCESSORS FOR THE SOLUTION OF INCOMPRESSIBLE FLOWS},

author={G{"O}{c{C}}MEN, MAHMUT MURAT},

year={2014},

school={MIDDLE EAST TECHNICAL UNIVERSITY}

}

Download (PDF)

View

Source

Source codes

Package:

CFD with CUDA: A Finite Element Based Flow Solver Using CUDA

3035

views

In recent years clock speeds and memory bandwidths of Graphics Processing Units (GPUs) increased dramatically compared to CPUs. Also GPU vendors developed and freely released new programming tools to make scientific computing on GPUs easier. With these recent developments the use of GPUs for general purpose computing becomes a popular research field. Researchers previously demonstrated that use of GPUs may provide tens of times of speeds-ups compared to CPU solvers for CFD methods such as Smoothed Particle Hydrodynamics, Lattice Boltzmann and Discontinuous Galerkin, which are known to offer very high parallelization potential. However, studies for the utilization of GPUs for classical finite volume and especially for finite element based CFD codes are rare in the literature. This study involves the development of a flow solver based on the Finite Element Method (FEM) working parallel on GPUs. CUDA (Compute Unified Device Architecture) programming toolkit developed by NVIDIA is used for GPU programming. Three-dimensional, laminar, incompressible, flows with possible heat transfer effects are considered. Governing equations are discretized using 2 different fractional step algorithms. Accuracy of the developed solver is tested using 5 benchmark problems, including a microchannel flow and flow inside a tube with conjugate heat transfer. Each step of the fractional step algorithm is investigated in detail on the CPU and GPU for run time performance. Speed-up tests are performed on a series of meshes with total number of unknowns between 700,000 and 6.7 million. Parallelization on the CPU is achieved by using Intel’s MKL library and OpenMP and on the GPU mostly CUBLAS, CUSPARSE and CUSP libraries are used with some scratch-built GPU kernels whenever necessary. For the largest mesh tried, GPU usage resulted in 5.79 and 1.86 times speed-ups compared to single-thread and 8-thread CPU solutions, respectively. The use of single precision arithmetic is investigated from accuracy and efficient points of view and it is seen that it does not degrade accuracy, while providing almost 2 times speed-up both on the CPU and the GPU. Compared to the explicit version, implicit fractional step algorithm turned out to be advantageous in terms of run time for steady state problems. On the other hand, explicit method uses less memory as expected.

Tags: Algorithms, CUBLAS, CUDA, FEM, Finite element method, Fluid dynamics, Lattice Boltzmann model, nVidia, nVidia GeForce GTX 280, Package, Tesla C2075, Thesis

January 16, 2015 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Parallel Implementation of the Finite Element Method on Graphics Processors for the Solution of Incompressible Flows

Package:

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Parallel Implementation of the Finite Element Method on Graphics Processors for the Solution of Incompressible Flows

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)