Performance Characterization and Optimization of Atomic Operations on AMD GPUs
Department of Computer Science, Virginia Tech
Proceedings of the IEEE Cluster 2011
@InProceedings{elteir-ieeecluster11-atomic-operations,
author={Elteir, Marwa and Lin, Heshan and Feng, Wu-chun},
title={Performance Characterization and Optimization of Atomic Operations on AMD GPUs},
booktitle={IEEE Cluster 2011},
address={Austin, TX, USA},
month={September},
year={2011}
}
Atomic operations are important building blocks in supporting general-purpose computing on graphics processing units (GPUs). For instance, they can be used to coordinate execution between concurrent threads, and in turn, assist in constructing complex data structures such as hash tables or implementing GPU-wide barrier synchronization. While the performance of atomic operations has improved substantially on the latest NVIDIA Fermi-based GPUs, system-provided atomic operations still incur significant performance penalties on AMD GPUs. A memory-bound kernel on an AMD GPU, for example, can suffer severe performance degradation when including an atomic operation, even if the atomic operation is never executed. In this paper, we first quantify the performance impact of atomic instructions to application kernels on AMD GPUs. We then propose a novel software-based implementation of atomic operations that can significantly improve the overall kernel performance. We evaluate its performance against the system-provided atomic using two micro-benchmarks and four real applications. The results show that using our softwarebased atomic operations on an AMD GPU can speedup an application kernel by 67-fold over the same application kernel but with the (default) system-provided atomic operations.
October 11, 2011 by hgpu