high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Locality optimization on a NUMA architecture for hybrid LU factorization

Locality optimization on a NUMA architecture for hybrid LU factorization

Adrien Remy, Marc Baboulin, Masha Sosonkina, Brigitte Rozoy

Inria and Universite Paris-Sud, France

hal-00957673, (10 March 2014)

@techreport{remy:hal-00957673,

hal_id={hal-00957673},

url={http://hal.inria.fr/hal-00957673},

title={Locality optimization on a NUMA architecture for hybrid LU factorization},

author={R{‘e}my, Adrien and Baboulin, Marc and Sosonkina, Masha and Rozoy, Brigitte},

keywords={ccNUMA; thread placement; dense linear systems; LU factorization; MAGMA library},

language={Anglais},

affiliation={Laboratoire de Recherche en Informatique – LRI , POSTALE – INRIA Saclay – Ile de France, Old Dominion University – ODU},

type={Rapport de recherche},

institution={INRIA},

number={RR-8497},

year={2014},

month={Mar},

pdf={http://hal.inria.fr/hal-00957673/PDF/RR-8497.pdf}

}

Download (PDF)

View

Source

2975

views

We study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear systems using an LU factorization algorithm. In particular we illustrate how an appropriate placement of the threads and memory on a NUMA architecture can improve the performance of the panel factorization and consequently accelerate the global LU factorization. We apply these placement strategies and present performance results for a hybrid multicore/GPU LU algorithm as it is implemented in the public domain library MAGMA.

Tags: Algorithms, Computer science, CUDA, Factorization, Linear Algebra, nVidia, Tesla S2050

March 12, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Locality optimization on a NUMA architecture for hybrid LU factorization

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Locality optimization on a NUMA architecture for hybrid LU factorization

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)