17564

Optimization of the Brillouin operator on the KNL architecture

Stephan Durr
University of Wuppertal, Gaussstrasse 20, D-42119 Wuppertal, Germany
arXiv:1709.01828 [hep-lat], (6 Sep 2017)

@article{durr2017optimization,

   title={Optimization of the Brillouin operator on the KNL architecture},

   author={Durr, Stephan},

   year={2017},

   month={sep},

   archivePrefix={"arXiv"},

   primaryClass={hep-lat}

}

Download Download (PDF)   View View   Source Source   

5880

views

Experiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with N_c=3 colors, N_v=12 right-hand-sides, N_{thr}=256 threads, on lattices of size 32^3*64, using exclusively OMP pragmas. Interestingly, the same routine performs quite well on Intel Core i7 architectures, too. Some observations on the much harder Wilson fermion matrix-times-vector optimization problem are added.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: