high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU

An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU

Andrew Davidson, Yao Zhang, John D. Owens

University of California, Davis

IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2011

DOI:10.1109/IPDPS.2011.92

@inproceedings{davidson2011auto,

title={An auto-tuned method for solving large tridiagonal systems on the GPU},

author={Davidson, A. and Zhang, Y. and Owens, J.D.},

booktitle={Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International},

pages={956–965},

year={2011},

organization={IEEE}

}

Download (PDF)

View

Source

2112

views

We present a multi-stage method for solving large tridiagonal systems on the GPU. Previously large tridiagonal systems cannot be efficiently solved due to the limitation of on-chip shared memory size. We tackle this problem by splitting the systems into smaller ones and then solving them on-chip. The multi-stage characteristic of our method, together with various workloads and GPUs of different capabilities, obligates an auto-tuning strategy to carefully select the switch points between computation stages. In particular, we show two ways to effectively prune the tuning space and thus avoid an impractical exhaustive search: (1) apply algorithmic knowledge to decouple tuning parameters, and (2) estimate search starting points based on GPU architecture parameters. We demonstrate that auto-tuning is a powerful tool that improves the performance by up to 5x, saves 17% and 32% of execution time on average respectively over static and dynamic tuning, and enables our multi-stage solver to outperform the Intel MKL tridiagonal solver on many parallel tridiagonal systems by 6-11x.

Tags: Algorithms, Computer science, CUDA, Linear Algebra, nVidia, nVidia GeForce 8800 GTX, nVidia GeForce GTX 280, nVidia GeForce GTX 470

October 16, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU

Your response

Recent source codes

True 4-Bit Quantized CNN Training on CPU

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

MSKernelBench & CUDAMaster

EvoScientist: Harness Vibe Research with Self-evolving AI Scientists

RepoLaunch: Automating Build and Test Pipeline of Code Repositories on ANY Language and ANY Platform

RepoLaunch: Automating Build and Test Pipeline of Code Repositories on ANY Language and ANY Platform

CONCUR: a benchmark designed to evaluate multithreaded Java code generated by LLMs

HIPRT: Ray Tracing using HIP

MXFP4 Training Support Codebase

Most viewed papers (last 30 days)

An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)