13832

Benchmarking the cost of thread divergence in CUDA

Piotr Bialas, Adam Strzelecki
Faculty of Physics, Astronomy and Computer Science, Jagiellonian University, ul. Lojasiewicza 11, 30-348 Krakow, Poland
arXiv:1504.01650 [cs.PF], (7 Apr 2015)

@article{bialas2015benchmarking,

   title={Benchmarking the cost of thread divergence in CUDA},

   author={Bialas, Piotr and Strzelecki, Adam},

   year={2015},

   month={apr},

   archivePrefix={"arXiv"},

   primaryClass={cs.PF}

}

Download Download (PDF)   View View   Source Source   

1943

views

All modern processors include a set of vector instructions. While this gives a tremendous boost to the performance, it requires a vectorized code that can take advantage of such instructions. As an ideal vectorization is hard to achieve in practice, one has to decide when different instructions may be applied to different elements of the vector operand. This is especially important in implicit vectorization as in NVIDIA CUDA Single Instruction Multiple Threads (SIMT) model, where the vectorization details are hidden from the programmer. In order to assess the costs incurred by incompletely vectorized code, we have developed a micro-benchmark that measures the characteristics of the CUDA thread divergence model on different architectures focusing on the loops performance.
Rating: 2.5/5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: