Scalability of Self-organizing Maps on a GPU cluster using OpenCL and CUDA
Department of Computing and Information Systems, Trent University, Peterborough, Ontario, Canada
Journal of Physics: Conference Series, 341, 012018, 2012
We evaluate a novel implementation of a Self-Organizing Map (SOM) on a Graphics Processing Unit (GPU) cluster. Using various combinations of OpenCL, CUDA, and two different graphics cards, we demonstrate the scalability of the SOM implementation on one to eight GPUs. Results indicate that while the algorithm scales well with the number of training samples and the map size, the benefits from using the data-parallel approaches offered by the GPU are severely limited when combined with the Message Passing Interface (MPI) in this setting, and comparable to speedups of GPU-based implementations as compared to optimized sequential code. Speedups achieved range from 3 to 32, for various map and training data sizes. We also observed a performance penalty for the OpenCL implementation as compared to CUDA.
February 11, 2012 by hgpu