## Optimization of Hierarchical Matrix Computation on GPU

Kyushu University, Fukuoka, Japan

Supercomputing Frontiers. Lecture Notes in Computer Science, vol 10776. Springer, 2018

@inproceedings{ohshima2018optimization,

title={Optimization of Hierarchical Matrix Computation on GPU},

author={Ohshima, Satoshi and Yamazaki, Ichitaro and Ida, Akihiro and Yokota, Rio},

booktitle={Asian Conference on Supercomputing Frontiers},

pages={274–292},

year={2018},

organization={Springer}

}

The demand for dense matrix computation in large scale and complex simulations is increasing; however, the memory capacity of current computer system is insufficient for such simulations. Hierarchical matrix method (H-matrices) is attracting attention as a computational method that can reduce the memory requirements of dense matrix computations. However, the computation of H-matrices is more complex than that of dense and sparse matrices; thus, accelerating the H-matrices is required. We focus on H-matrix – vector multiplication (HMVM) on a single NVIDIA Tesla P100 GPU. We implement five GPU kernels and compare execution times among various processors (the Broadwell-EP, Skylake-SP, and Knights Landing) by OpenMP. The results show that, although an HMVM kernel can compute many small GEMV kernels, merging such kernels to a single GPU kernel was the most effective implementation. Moreover, the performance of BATCHED BLAS in the MAGMA library was comparable to that of the manually tuned GPU kernel.

March 25, 2018 by hgpu