5591

NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce

Amol Ghoting, Prabhanjan Kambadur, Edwin Pednault, Ramakrishnan Kannan
IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’11, 2011

@inproceedings{ghoting2011nimble,

   title={NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce},

   author={Ghoting, A. and Kambadur, P. and Pednault, E. and Kannan, R.},

   booktitle={Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining},

   pages={334–342},

   year={2011},

   organization={ACM}

}

Download Download (PDF)   View View   Source Source   

2300

views

In the last decade, advances in data collection and storage technologies have led to an increased interest in designing and implementing large-scale parallel algorithms for machine learning and data mining (ML-DM). Existing programming paradigms for expressing large-scale parallelism such as MapReduce (MR) and the Message Passing Interface (MPI) have been the de facto choices for implementing these ML-DM algorithms. The MR programming paradigm has been of particular interest as it gracefully handles large datasets and has built-in resilience against failures. However, the existing parallel programming paradigms are too low-level and ill-suited for implementing ML-DM algorithms. To address this deficiency, we present NIMBLE, a portable infrastructure that has been specifically designed to enable the rapid implementation of parallel ML-DM algorithms. The infrastructure allows one to compose parallel ML-DM algorithms using reusable (serial and parallel) building blocks that can be efficiently executed using MR and other parallel programming models; it currently runs on top of Hadoop, which is an open-source MR implementation. We show how NIMBLE can be used to realize scalable implementations of ML-DM algorithms and present a performance evaluation.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: