Exact diagonalization of quantum lattice models on coprocessors
Aalto University School of Science, P.O. Box 14100, 00076 Aalto, Finland
arXiv:1511.00863 [cond-mat.str-el], (3 Nov 2015)
@article{siro2015exact,
title={Exact diagonalization of quantum lattice models on coprocessors},
author={Siro, Topi and Harju, Ari},
year={2015},
month={nov},
archivePrefix={"arXiv"},
primaryClass={cond-mat.str-el}
}
We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.
November 4, 2015 by hgpu