high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Chemistry » Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs

Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs

Moritz Kreutzer, Georg Hager, Dominik Ernst, Holger Fehske, Alan R. Bishop, Gerhard Wellein

Erlangen Regional Computing Center (RRZE), Friedrich-Alexander University of Erlangen-Nuremberg

arXiv:1803.02156 [cs.MS], (6 Mar 2018)

@article{kreutzer2018chebyshev,

title={Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs},

author={Kreutzer, Moritz and Hager, Georg and Ernst, Dominik and Fehske, Holger and Bishop, Alan R. and Wellein, Gerhard},

year={2018},

month={mar},

archivePrefix={"arXiv"},

primaryClass={cs.MS}

}

Download (PDF)

View

Source

Source codes

Package:

GHOST: General, Hybrid and Optimized Sparse Toolkit

3355

views

Chebyshev filter diagonalization is well established in quantum chemistry and quantum physics to compute bulks of eigenvalues of large sparse matrices. Choosing a block vector implementation, we investigate optimization opportunities on the new class of high-performance compute devices featuring both high-bandwidth and low-bandwidth memory. We focus on the transparent access to the full address space supported by both architectures under consideration: Intel Xeon Phi "Knights Landing" and Nvidia "Pascal." We propose two optimizations: (1) Subspace blocking is applied for improved performance and data access efficiency. We also show that it allows transparently handling problems much larger than the high-bandwidth memory without significant performance penalties. (2) Pipelining of communication and computation phases of successive subspaces is implemented to hide communication costs without extra memory traffic. As an application scenario we use filter diagonalization studies on topological insulator materials. Performance numbers on up to 512 nodes of the OakForest-PACS and Piz Daint supercomputers are presented, achieving beyond 100 Tflop/s for computing 100 inner eigenvalues of sparse matrices of dimension one billion.

Tags: Chemistry, Computational Physics, Computer science, CUDA, Intel Xeon Phi, nVidia, OpenMP, Package, Performance, Physics, Quantum Physics, Tesla P100

March 10, 2018 by hgpu

Rating: 4.0/5. From 3 votes.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs

Package:

Your response

Recent source codes

tritonBLAS: A Lightweight Triton-based General Matrix Multiplication (GEMM) Library

hls4ml: Machine learning on FPGAs using HLS

ThunderKittens: Tile primitives for speedy kernels

NVIDIA Nemotron Parse 1.1

Iris: AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

HipKittens: Fast and Furious AMD Kernels

Fortran xDSL dialects

mt4g: Memory Topology 4 GPUs

Falcon: GPU-Based Floating-point Adaptive Lossless Compression

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

Most viewed papers (last 30 days)

Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)