Comprehensive Performance Monitoring for GPU Cluster Systems
Comput. Sci. Dept., Ludwig-Maximilians-Univ. (LMU) Munich, Munich, Germany
IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011
@article{furlinger2011comprehensive,
title={Comprehensive Performance Monitoring for GPU Cluster Systems},
author={F{"u}rlinger, K. and Wright, N.J. and Skinner, D.},
year={2011}
}
Accelerating applications with GPUs has recently garnered a lot of interest from the scientific computing community. While tools for optimizing individual kernels are readily available, there is a lack of support for the specific needs of the HPC area. Most importantly, integration with existing parallel programming models (MPI and threading) and scalability to the full size of the machine are required. To address these issues we present our work on monitoring and performance evaluation of the CUDA runtime environment in the context of our scalable and efficient profiling tool IPM. We derive metrics for GPU utilization and identify missed opportunities for GPU-CPU overlap. We evaluate the monitoring accuracy and overheads of our approach and apply it to a full scientific application.
November 13, 2011 by hgpu