high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » An N log N Parallel Fast Direct Solver for Kernel Matrices

An N log N Parallel Fast Direct Solver for Kernel Matrices

Chenhan D. Yu, William B. March, George Biros

Department of Computer Science, The University of Texas at Austin, Austin, Texas, USA

arXiv:1701.02324 [cs.DC], (9 Jan 2017)

BibTeX

Download (PDF)

View

Source

Source codes

Package:

ASKIT: Approximate Skeletonization Kernel Independent Treecode

2104

views

Kernel matrices appear in machine learning and non-parametric statistics. Given N points in d dimensions and a kernel function that requires $mathcal{O}(d)$ work to evaluate, we present an $mathcal{O}(dNlog N)$-work algorithm for the approximate factorization of a regularized kernel matrix, a common computational bottleneck in the training phase of a learning task. With this factorization, solving a linear system with a kernel matrix can be done with $mathcal{O}(Nlog N)$ work. Our algorithm only requires kernel evaluations and does not require that the kernel matrix admits an efficient global low rank approximation. Instead our factorization only assumes low-rank properties for the off-diagonal blocks under an appropriate row and column ordering. We also present a hybrid method that, when the factorization is prohibitively expensive, combines a partial factorization with iterative methods. As a highlight, we are able to approximately factorize a dense 11M*11M kernel matrix in 2 minutes on 3,072 x86 "Haswell" cores and a 4.5M*4.5M matrix in 1 minute using 4,352 "Knights Landing" cores.

Tags: Algorithms, Computational Complexity, Computer science, Factorization, Intel Xeon Phi, Machine learning, Package, Performance

January 16, 2017 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

An N log N Parallel Fast Direct Solver for Kernel Matrices

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

An N log N Parallel Fast Direct Solver for Kernel Matrices

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)