high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Parallel Implementations of the Cholesky Decomposition on CPUs and GPUs

Parallel Implementations of the Cholesky Decomposition on CPUs and GPUs

Joao Paulo Tarasconi Ruschel

Universidade Federal Do Rio Grande Do Sul, Instituto De Informatica

Universidade Federal Do Rio Grande Do Sul, 2016

@phdthesis{da2016parallel,

title={Parallel Implementations of the Cholesky Decomposition on CPUs and GPUs},

author={DA COMPUTA{c{C}}{~A}O, CURSO DE CI{^E}NCIA},

year={2016},

school={UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL}

}

Download (PDF)

View

Source

Source codes

Package:

Parallel implementations of the Cholesky decomposition in CPU and GPU

3474

views

As Central Processing Units (CPUs) and Graphical Processing Units (GPUs) get progressively better, different approaches and designs for implementing algorithms with high data load must be studied and compared. This work compares several different algorithm designs and parallelization APIs (such as OpenMP, OpenCL and CUDA) for both CPU and GPU platforms. We used the Cholesky decomposition, a high-level arithmetic algorithm used in many linear algebra problems, as the benchmarking algorithm, due to being easily parallelizable, and having a considerable data dependence between elements. We carried out various experiments using the different designs and APIs in order to find the techniques which yield the best performance for each platform. We also compared these implementations with state-of-the-art solutions (such as LAPACK and cuSOLVER), and provided insights into the differences in implementation and performance. Our experiments showed us that parallelization on CPU tends to have a better performance than on GPU for this particular kind of algorithm, due to the intrinsic memory-intensive nature of the algorithm and memory transfer overhead, and that attempts at code micro-optimization do not offer any significant speedup.

Tags: Algorithms, Benchmarking, Computer science, CUDA, Linear Algebra, Matrix decomposition, nVidia, OpenCL, OpenMP, Package, Performance, Tesla K80, Thesis

January 26, 2017 by hgpu

Rating: 2.7/5. From 6 votes.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Parallel Implementations of the Cholesky Decomposition on CPUs and GPUs

Package:

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Parallel Implementations of the Cholesky Decomposition on CPUs and GPUs

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)