high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce

NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce

Amol Ghoting, Prabhanjan Kambadur, Edwin Pednault, Ramakrishnan Kannan

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’11, 2011

DOI:10.1145/2020408.2020464

BibTeX

Download (PDF)

View

Source

2682

views

In the last decade, advances in data collection and storage technologies have led to an increased interest in designing and implementing large-scale parallel algorithms for machine learning and data mining (ML-DM). Existing programming paradigms for expressing large-scale parallelism such as MapReduce (MR) and the Message Passing Interface (MPI) have been the de facto choices for implementing these ML-DM algorithms. The MR programming paradigm has been of particular interest as it gracefully handles large datasets and has built-in resilience against failures. However, the existing parallel programming paradigms are too low-level and ill-suited for implementing ML-DM algorithms. To address this deficiency, we present NIMBLE, a portable infrastructure that has been specifically designed to enable the rapid implementation of parallel ML-DM algorithms. The infrastructure allows one to compose parallel ML-DM algorithms using reusable (serial and parallel) building blocks that can be efficiently executed using MR and other parallel programming models; it currently runs on top of Hadoop, which is an open-source MR implementation. We show how NIMBLE can be used to realize scalable implementations of ML-DM algorithms and present a performance evaluation.

Tags: Algorithms, Computer science, Data mining, Machine learning, MapReduce, MPI, OpenCL, Performance, Software Engineering

September 16, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)