Developing a High Performance Software Library with MPI and CUDA for Matrix Computations

hgpu.org » Applications » Computer science » Developing a High Performance Software Library with MPI and CUDA for Matrix Computations

Developing a High Performance Software Library with MPI and CUDA for Matrix Computations

Bogdan Oancea, Tudorel Andrei

"Nicolae Titulescu" University of Bucharest

Computational Methods in Social Sciences (CMSS), Vol. I, Issue 2/2013, 2013

BibTeX

Download (PDF)

View

Source

1789

views

Nowadays, the paradigm of parallel computing is changing. CUDA is now a popular programming model for general purpose computations on GPUs and a great number of applications were ported to CUDA obtaining speedups of orders of magnitude comparing to optimized CPU implementations. Hybrid approaches that combine the message passing model with the shared memory model for parallel computing are a solution for very large applications. We considered a heterogeneous cluster that combines the CPU and GPU computations using MPI and CUDA for developing a high performance linear algebra library. Our library deals with large linear systems solvers because they are a common problem in the fields of science and engineering. Direct methods for computing the solution of such systems can be very expensive due to high memory requirements and computational cost. An efficient alternative are iterative methods which computes only an approximation of the solution. In this paper we present an implementation of a library that uses a hybrid model of computation using MPI and CUDA implementing both direct and iterative linear systems solvers. Our library implements LU and Cholesky factorization based solvers and some of the non-stationary iterative methods using the MPI/CUDA combination. We compared the performance of our MPI/CUDA implementation with classic programs written to be run on a single CPU.

Tags: Computer science, CUDA, Factorization, Heterogeneous systems, Linear Algebra, Memory model, MPI, nVidia, nVidia GeForce GTX 280

December 29, 2013 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Developing a High Performance Software Library with MPI and CUDA for Matrix Computations

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Developing a High Performance Software Library with MPI and CUDA for Matrix Computations

Share this:

Recent source codes

Most viewed papers (last 30 days)