high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » Efficient Preconditioned Conjugate Gradient Parallelization on GPU

Efficient Preconditioned Conjugate Gradient Parallelization on GPU

A. F. P. Camargos, V. C. Silva

Universidade de Sao Paulo – Escola Politecnica, Sao Paulo, Brasil

19th International Conference on the Computation of Electromagnetic Fields (Compumag), 2013

@article{camargos2013efficient,

title={Efficient Preconditioned Conjugate Gradient Parallelization on GPU},

author={Camargos, AFP and Silva, VC},

year={2013}

}

Download (PDF)

View

Source

2619

views

We present a performance analysis of a parallel implementation of both conjugate gradient and preconditioned conjugate gradient solvers using graphic processing units with CUDA parallel programming model. The solvers were optimized for a fast solution of sparse systems of equations arising from Finite Element Analysis (FEA) of electromagnetic phenomena. The preconditioners were Incomplete Cholesky factorization and Incomplete LU factorization. Results show that the speedup factor for the incomplete Cholesky decomposition was above 3 compared to the CPU implementation.

Tags: Conjugate gradient solver, CUDA, Electrodynamics, Factorization, FEM, Finite element method, nVidia, nVidia GeForce GT 240

March 12, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Efficient Preconditioned Conjugate Gradient Parallelization on GPU

Your response

Recent source codes

tritonBLAS: A Lightweight Triton-based General Matrix Multiplication (GEMM) Library

hls4ml: Machine learning on FPGAs using HLS

ThunderKittens: Tile primitives for speedy kernels

NVIDIA Nemotron Parse 1.1

Iris: AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

HipKittens: Fast and Furious AMD Kernels

Fortran xDSL dialects

mt4g: Memory Topology 4 GPUs

Falcon: GPU-Based Floating-point Adaptive Lossless Compression

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

Most viewed papers (last 30 days)

Efficient Preconditioned Conjugate Gradient Parallelization on GPU

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)