high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » 3D Graphics and Realism » Concurrent Number Cruncher: An Efficient Sparse Linear Solver on the GPU

Concurrent Number Cruncher: An Efficient Sparse Linear Solver on the GPU

Luc Buatois, Guillaume Caumon, Bruno Levy

Gocad Research Group, INRIA, Nancy Universite, France

High Performance Computing and Communications, Lecture Notes in Computer Science, 2007, Volume 4782/2007, 358-371

DOI:10.1007/978-3-540-75444-2_37

BibTeX

Download (PDF)

View

Source

2132

views

A wide class of geometry processing and PDE resolution methods needs to solve a linear system, where the non-zero pattern of the matrix is dictated by the connectivity matrix of the mesh. The advent of GPUs with their ever-growing amount of parallel horsepower makes them a tempting resource for such numerical computations. This can be helped by new APIs (CTM from ATI and CUDA from NVIDIA) which give a direct access to the multithreaded computational resources and associated memory bandwidth of GPUs; CUDA even provides a BLAS implementation but only for dense matrices (CuBLAS). However, existing GPU linear solvers are restricted to specific types of matrices, or use non-optimal compressed row storage strategies. By combining recent GPU programming techniques with supercomputing strategies (namely block compressed row storage and register blocking), we implement a sparse general-purpose linear solver which outperforms leading-edge CPU counterparts (MKL / ACML).

Tags: 3D Graphics and Realism, ATI, ATI Radeon X1900, Computer science, Linear Algebra, OpenGL, Sparse matrix

December 22, 2010 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Concurrent Number Cruncher: An Efficient Sparse Linear Solver on the GPU

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Concurrent Number Cruncher: An Efficient Sparse Linear Solver on the GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)