Selecting the Best Tridiagonal System Solver Projected on Multi-Core CPU and GPU Platforms

hgpu.org » Programming » Algorithms » Selecting the Best Tridiagonal System Solver Projected on Multi-Core CPU and GPU Platforms

Selecting the Best Tridiagonal System Solver Projected on Multi-Core CPU and GPU Platforms

Pablo Quesada-Barriuso, Julian Lamas-Rodriguez, Dora B. Heras, Montserrat Boo, Francisco Arguello

Centro de Investigacion en Tecnoloxias da Informacion (CITIUS), Univ. of Santiago de Compostela, Spain

The 2011 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’11), 2011

BibTeX

Download (PDF)

View

Source

1857

views

Nowadays multicore processors and graphics cards are commodity hardware that can be found in personal computers. Both CPU and GPU are capable of performing high-end computations. In this paper we present and compare parallel implementations of two tridiagonal system solvers. We analyze the cyclic reduction method, as an example of fine-grained parallelism, and Bondeli’s algorithm, as a coarse-grained example of parallelism. Both algorithms are implemented for GPU architectures using CUDA and multi-core CPU with shared memory architectures using OpenMP. The results are compared in terms of execution time, speedup, and GFLOPS. For a large system of equations, 2^22, the best results were obtained for Bondeli’s algorithm (speedup 1.55x and 0.84 GFLOPS) for multi-core CPU platforms while the cyclic reduction (speedup 17.06x and 5.09 GFLOPS) was the best for the case of GPU platforms.

Tags: Algorithms, Computer science, CUDA, Linear Algebra, nVidia, nVidia GeForce GTX 295, OpenMP

January 5, 2012 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Selecting the Best Tridiagonal System Solver Projected on Multi-Core CPU and GPU Platforms

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Selecting the Best Tridiagonal System Solver Projected on Multi-Core CPU and GPU Platforms

Share this:

Recent source codes

Most viewed papers (last 30 days)