high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Optimization of Hierarchical Matrix Computation on GPU

Optimization of Hierarchical Matrix Computation on GPU

Satoshi Ohshima, Ichitaro Yamazaki, Akihiro Ida, Rio Yokota

Kyushu University, Fukuoka, Japan

Supercomputing Frontiers. Lecture Notes in Computer Science, vol 10776. Springer, 2018

DOI:10.1007/978-3-319-69953-0_16

@inproceedings{ohshima2018optimization,

title={Optimization of Hierarchical Matrix Computation on GPU},

author={Ohshima, Satoshi and Yamazaki, Ichitaro and Ida, Akihiro and Yokota, Rio},

booktitle={Asian Conference on Supercomputing Frontiers},

pages={274–292},

year={2018},

organization={Springer}

}

Download (PDF)

View

Source

1688

views

The demand for dense matrix computation in large scale and complex simulations is increasing; however, the memory capacity of current computer system is insufficient for such simulations. Hierarchical matrix method (H-matrices) is attracting attention as a computational method that can reduce the memory requirements of dense matrix computations. However, the computation of H-matrices is more complex than that of dense and sparse matrices; thus, accelerating the H-matrices is required. We focus on H-matrix – vector multiplication (HMVM) on a single NVIDIA Tesla P100 GPU. We implement five GPU kernels and compare execution times among various processors (the Broadwell-EP, Skylake-SP, and Knights Landing) by OpenMP. The results show that, although an HMVM kernel can compute many small GEMV kernels, merging such kernels to a single GPU kernel was the most effective implementation. Moreover, the performance of BATCHED BLAS in the MAGMA library was comparable to that of the manually tuned GPU kernel.

Tags: Computer science, CUDA, Intel Xeon Phi, Linear Algebra, nVidia, OpenMP, Tesla P100

March 25, 2018 by hgpu

Rating: 4.0/5. From 1 vote.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Optimization of Hierarchical Matrix Computation on GPU

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Optimization of Hierarchical Matrix Computation on GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)