Optimization of Hierarchical Matrix Computation on GPU

Satoshi Ohshima, Ichitaro Yamazaki, Akihiro Ida, Rio Yokota
Kyushu University, Fukuoka, Japan
Supercomputing Frontiers. Lecture Notes in Computer Science, vol 10776. Springer, 2018


   title={Optimization of Hierarchical Matrix Computation on GPU},

   author={Ohshima, Satoshi and Yamazaki, Ichitaro and Ida, Akihiro and Yokota, Rio},

   booktitle={Asian Conference on Supercomputing Frontiers},





Download Download (PDF)   View View   Source Source   



The demand for dense matrix computation in large scale and complex simulations is increasing; however, the memory capacity of current computer system is insufficient for such simulations. Hierarchical matrix method (H-matrices) is attracting attention as a computational method that can reduce the memory requirements of dense matrix computations. However, the computation of H-matrices is more complex than that of dense and sparse matrices; thus, accelerating the H-matrices is required. We focus on H-matrix – vector multiplication (HMVM) on a single NVIDIA Tesla P100 GPU. We implement five GPU kernels and compare execution times among various processors (the Broadwell-EP, Skylake-SP, and Knights Landing) by OpenMP. The results show that, although an HMVM kernel can compute many small GEMV kernels, merging such kernels to a single GPU kernel was the most effective implementation. Moreover, the performance of BATCHED BLAS in the MAGMA library was comparable to that of the manually tuned GPU kernel.
Rating: 4.0/5. From 1 vote.
Please wait...

* * *

* * *

* * *

HGPU group © 2010-2022 hgpu.org

All rights belong to the respective authors

Contact us: