high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Advancing the distributed Multi-GPU ChASE library through algorithm optimization and NCCL library

Advancing the distributed Multi-GPU ChASE library through algorithm optimization and NCCL library

Xinzhe Wu, Edoardo Di Napoli

Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH, Jülich, Germany

arXiv:2309.15595 [cs.DC], (27 Sep 2023)

DOI:10.48550/arXiv.2309.15595

@misc{wu2023advancing,

title={Advancing the distributed Multi-GPU ChASE library through algorithm optimization and NCCL library},

author={Xinzhe Wu and Edoardo Di Napoli},

year={2023},

eprint={2309.15595},

archivePrefix={arXiv},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

Source codes

Package:

ChASE: a Chebyshev Accelerated Subspace Eigensolver for Dense Eigenproblems

1405

views

As supercomputers become larger with powerful Graphics Processing Unit (GPU), traditional direct eigensolvers struggle to keep up with the hardware evolution and scale efficiently due to communication and synchronization demands. Conversely, subspace eigensolvers, like the Chebyshev Accelerated Subspace Eigensolver (ChASE), have a simpler structure and can overcome communication and synchronization bottlenecks. ChASE is a modern subspace eigensolver that uses Chebyshev polynomials to accelerate the computation of extremal eigenpairs of dense Hermitian eigenproblems. In this work we show how we have modified ChASE by rethinking its memory layout, introducing a novel parallelization scheme, switching to a more performing communication-avoiding algorithm for one of its inner modules, and substituting the MPI library by the vendor-optimized NCCL library. The resulting library can tackle dense problems with size up to N=O(106), and scales effortlessly up to the full 900 nodes — each one powered by 4×A100 NVIDIA GPUs — of the JUWELS Booster hosted at the Jülich Supercomputing Centre.

Tags: Algorithm optimization, Computer science, CUDA, MPI, nVidia, nVidia A100, OpenMP, OpenMPI, Package

October 8, 2023 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Advancing the distributed Multi-GPU ChASE library through algorithm optimization and NCCL library

Package:

Your response

Recent source codes

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

TRUST: a thermalhydraulic software package for CFD simulations

Modular: The Modular Platform (includes MAX & Mojo)

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

Most viewed papers (last 30 days)

Advancing the distributed Multi-GPU ChASE library through algorithm optimization and NCCL library

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)