Accelerating mahout on heterogeneous clusters using HadoopCL

hgpu.org » Applications » Computer science » Accelerating mahout on heterogeneous clusters using HadoopCL

Accelerating mahout on heterogeneous clusters using HadoopCL

Xiangyu Li

Northeastern University, Boston, Massachusetts

Northeastern University, 2014

BibTeX

Download (PDF)

View

Source

2156

views

MapReduce is a programming model capable of processing massive data in parallel across hundreds of computing nodes in a cluster. It hides many of the complicated details of parallel computing and provides a straightforward interface for programmers to adapt their algorithms to improve productivity. Many MapReduce-based applications have utilized the power of this model, including machine learning. MapReduce can meet the demands of processing massive data generated by user-server interaction in applications including web search, video viewing and online product purchasing. The Mahout recommendation system is one of the most popular open source recommendation systems that employs machine learning techniques based on MapReduce. Mahout provides a parallel computing infrastructure that can be applied to study a range of different types of datasets. A complimentary trend occurring in cluster computing is the introduction of GPUs which provide higher bandwidth and data-level parallelism. There have been several efforts that combine the simplicity of the MapReduce framework with the power of GPUs. HadoopCL is one framework that generates OpenCL programs automatically from Java to be executed on heterogeneous architectures in a cluster. It pprovides the infrastructure for utilizing GPUs in a cluster environment. In this work, we present a detailed description of Mahout recommender system and a profiling of Mahout performance running on multiple nodes in a cluster. We also present a performance evaluation of a Mahout job running on heterogeneous platforms using CPUs, AMD APUs and NVIDIA discrete GPUs with HadoopCL. We choose a time-consuming job in Mahout and manually tune a GPU kernel for it. We also modify the pipeline of HadoopCL from map-$>$reduce to filter-$>$map-$>$reduce that increase the flexibility of HadoopCL in task assignment. Analysis of the performance issues of automatically generated OpenCL GPU program is provided as well as the optimization we make to resolve the issues. We achieve around 1.5 to 2X speedup from using optimized GPU kernel integrated into HadoopCL on a APU cluster and 2X to 4X speedup on a discrete GPU cluster.

Tags: Cluster computing, Computer science, GPU cluster, Heterogeneous systems, Java, Machine learning, MapReduce, nVidia, OpenCL, Tesla K20, Thesis

January 19, 2015 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org