high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » 3D Graphics and Realism » Concurrent Number Cruncher: An Efficient Sparse Linear Solver on the GPU

Concurrent Number Cruncher: An Efficient Sparse Linear Solver on the GPU

Luc Buatois, Guillaume Caumon, Bruno Levy

Gocad Research Group, INRIA, Nancy Universite, France

High Performance Computing and Communications, Lecture Notes in Computer Science, 2007, Volume 4782/2007, 358-371

DOI:10.1007/978-3-540-75444-2_37

BibTeX

Download (PDF)

View

Source

2143

views

A wide class of geometry processing and PDE resolution methods needs to solve a linear system, where the non-zero pattern of the matrix is dictated by the connectivity matrix of the mesh. The advent of GPUs with their ever-growing amount of parallel horsepower makes them a tempting resource for such numerical computations. This can be helped by new APIs (CTM from ATI and CUDA from NVIDIA) which give a direct access to the multithreaded computational resources and associated memory bandwidth of GPUs; CUDA even provides a BLAS implementation but only for dense matrices (CuBLAS). However, existing GPU linear solvers are restricted to specific types of matrices, or use non-optimal compressed row storage strategies. By combining recent GPU programming techniques with supercomputing strategies (namely block compressed row storage and register blocking), we implement a sparse general-purpose linear solver which outperforms leading-edge CPU counterparts (MKL / ACML).

Tags: 3D Graphics and Realism, ATI, ATI Radeon X1900, Computer science, Linear Algebra, OpenGL, Sparse matrix

December 22, 2010 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Concurrent Number Cruncher: An Efficient Sparse Linear Solver on the GPU

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Concurrent Number Cruncher: An Efficient Sparse Linear Solver on the GPU

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)