CUDA-level performance with python-level productivity for Gaussian mixture model applications
Parallel Computing Laboratory, Computer Science Division, University of California at Berkeley
Proceedings of the 3rd USENIX conference on Hot topic in parallelism, HotPar’11, 2011
@article{cook2011cuda,
title={CUDA-level Performance with Python-level Productivity for Gaussian Mixture Model Applications},
author={Cook, H. and Gonina, E. and Kamil, S. and Friedland, G. and Patterson, D. and Fox, A.},
booktitle={Proceedings of the 3rd USENIX conference on Hot topic in parallelism, HotPar’11, 2011},
year={2011},
publisher={ACM}
}
Typically, scientists with computational needs prefer to use high-level languages such as Python or MATLAB; however, large computationally-intensive problems must eventually be recoded in a low level language such as C or Fortran by expert programmers in order to achieve sufficient performance. In addition, multiple strategies may exist for mapping a problem onto parallel hardware depending on the input data size and the hardware parameters. We show how to preserve the productivity of high-level languages while obtaining the performance of the best low-level language code variant for a given hardware platform and problem size using SEJITS, a set of techniques that leverages just-in-time code generation and compilation. As a case study, we demonstrate our technique for Gaussian Mixture Model training using the EM algorithm. With the addition of one line of code to import our framework, a domain programmer using an existing Python GMM library can run her program unmodified on a GPU-equipped computer and achieve performance that meets or beats GPU code hand-crafted by a human expert. We also show that despite the overhead of allowing the domain expert’s program to use Python and the overhead of just-in-time code generation and compilation, our approach still results in performance competitive with hand-crafted GPU code.
September 7, 2011 by hgpu