Hadoopcl2: Motivating the design of a distributed, heterogeneous programming system with machine-learning applications

hgpu.org » Programming » Algorithms » Hadoopcl2: Motivating the design of a distributed, heterogeneous programming system with machine-learning applications

Hadoopcl2: Motivating the design of a distributed, heterogeneous programming system with machine-learning applications

Max Grossman, Mauricio Breternitz, Vivek Sarkar

Rice University, Department of Computer Science, 6100 Main St, Houston, TX, USA

Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013

BibTeX

Download (PDF)

View

Source

2341

views

Machine learning (ML) algorithms have garnered increased interest as they demonstrate improved ability to extract meaningful trends from large, diverse, and noisy data sets. While research is advancing the state-of-the-art in ML algorithms, it is difficult to drastically improve the real-world performance of these algorithms. Porting new and existing algorithms from single-node systems to multi-node clusters, or from architecturally homogeneous systems to heterogeneous systems, is a promising optimization technique. However, performing optimized ports is challenging for domain experts who may lack experience in distributed and heterogeneous software development. This work explores how challenges in ML application development on heterogeneous, distributed systems shaped the development of the HadoopCL2 (HCL2) programming system. ML applications guide this work because they exhibit features that make application development difficult: large & diverse datasets, complex algorithms, and the need for domain-specific knowledge. The goal of this work is a general, MapReduce programming system that outperforms existing programming systems. This work evaluates the performance and portability of HCL2 against five ML applications from the Mahout ML framework on two hardware platforms. HCL2 demonstrates speedups of greater than 20x relative to Mahout for three computationally heavy algorithms and maintains minor performance improvements for two I/O bound algorithms.

Tags: Algorithms, Computer science, Hadoop, Heterogeneous systems, Machine learning, MapReduce, nVidia, OpenCL, Spark, Tesla M2050

November 13, 2016 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org