Accelerating mahout on heterogeneous clusters using HadoopCL

Xiangyu Li
Northeastern University, Boston, Massachusetts
Northeastern University, 2014


   title={Accelerating mahout on heterogeneous clusters using HadoopCL},

   author={Li, Xiangyu},


   school={Northeastern University Boston}


Download Download (PDF)   View View   Source Source   



MapReduce is a programming model capable of processing massive data in parallel across hundreds of computing nodes in a cluster. It hides many of the complicated details of parallel computing and provides a straightforward interface for programmers to adapt their algorithms to improve productivity. Many MapReduce-based applications have utilized the power of this model, including machine learning. MapReduce can meet the demands of processing massive data generated by user-server interaction in applications including web search, video viewing and online product purchasing. The Mahout recommendation system is one of the most popular open source recommendation systems that employs machine learning techniques based on MapReduce. Mahout provides a parallel computing infrastructure that can be applied to study a range of different types of datasets. A complimentary trend occurring in cluster computing is the introduction of GPUs which provide higher bandwidth and data-level parallelism. There have been several efforts that combine the simplicity of the MapReduce framework with the power of GPUs. HadoopCL is one framework that generates OpenCL programs automatically from Java to be executed on heterogeneous architectures in a cluster. It pprovides the infrastructure for utilizing GPUs in a cluster environment. In this work, we present a detailed description of Mahout recommender system and a profiling of Mahout performance running on multiple nodes in a cluster. We also present a performance evaluation of a Mahout job running on heterogeneous platforms using CPUs, AMD APUs and NVIDIA discrete GPUs with HadoopCL. We choose a time-consuming job in Mahout and manually tune a GPU kernel for it. We also modify the pipeline of HadoopCL from map-$>$reduce to filter-$>$map-$>$reduce that increase the flexibility of HadoopCL in task assignment. Analysis of the performance issues of automatically generated OpenCL GPU program is provided as well as the optimization we make to resolve the issues. We achieve around 1.5 to 2X speedup from using optimized GPU kernel integrated into HadoopCL on a APU cluster and 2X to 4X speedup on a discrete GPU cluster.
No votes yet.
Please wait...

* * *

* * *

* * *

HGPU group © 2010-2022 hgpu.org

All rights belong to the respective authors

Contact us: