Performance Drawbacks for Matrix Multiplication using Set Associative Cache in GPU devices
FON University, 1000 Skopje, Macedonia
36th International Convention MIPRO, 2013
@article{djinevski2013performance,
title={Performance Drawbacks for Matrix Multiplication using Set Associative Cache in GPU devices},
author={Djinevski, Leonid and Arsenovski, Sime and Ristov, Sasko and Gusev, Marjan},
year={2013}
}
Performance of shared memory processors show negative performance impulses (drawbacks) in certain regions for execution of the basic matrix multiplication algorithm. In this paper we continue with analysis of GPU memory hierarchy and corresponding cache memory organization. We give a theoretical analysis why a negative performance impulse appears for specifics problem sizes. The main reason is the cache storage organization, i.e. the negative performance peak appears caused by mapping of matrix elements onto one cache set, instead of using the whole cache. The obtained experimental results prove our theoretical analysis. We also propose a method to avoid situations where performance drawbacks appear.
August 19, 2013 by hgpu