high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor

Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor

Mian Lu, Lei Zhang, Huynh Phung Huynh, Zhongliang Ong, Yun Liang, Bingsheng He, Rick Siow Mong Goh, Richard Huynh

Institute of High Performance Computing, A*STAR

arXiv:1309.0215 [cs.DC], (1 Sep 2013)

@article{2013arXiv1309.0215L,

author={Lu}, M. and {Zhang}, L. and {Phung Huynh}, H. and {Ong}, Z. and {Liang}, Y. and {He}, B. and {Siow Mong Goh}, R. and {Huynh}, R.},

title={"{Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor}"},

journal={ArXiv e-prints},

archivePrefix={"arXiv"},

eprint={1309.0215},

primaryClass={"cs.DC"},

keywords={Computer Science – Distributed, Parallel, and Cluster Computing},

year={2013},

month={sep},

adsurl={http://adsabs.harvard.edu/abs/2013arXiv1309.0215L},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download (PDF)

View

Source

2908

views

With the ease-of-programming, flexibility and yet efficiency, MapReduce has become one of the most popular frameworks for building big-data applications. MapReduce was originally designed for distributed-computing, and has been extended to various architectures, e,g, multi-core CPUs, GPUs and FPGAs. In this work, we focus on optimizing the MapReduce framework on Xeon Phi, which is the latest product released by Intel based on the Many Integrated Core Architecture. To the best of our knowledge, this is the first work to optimize the MapReduce framework on the Xeon Phi. In our work, we utilize advanced features of the Xeon Phi to achieve high performance. In order to take advantage of the SIMD vector processing units, we propose a vectorization friendly technique for the map phase to assist the auto-vectorization as well as develop SIMD hash computation algorithms. Furthermore, we utilize MIMD hyper-threading to pipeline the map and reduce to improve the resource utilization. We also eliminate multiple local arrays but use low cost atomic operations on the global array for some applications, which can improve the thread scalability and data locality due to the coherent L2 caches. Finally, for a given application, our framework can either automatically detect suitable techniques to apply or provide guideline for users at compilation time. We conduct comprehensive experiments to benchmark the Xeon Phi and compare our optimized MapReduce framework with a state-of-the-art multi-core based MapReduce framework (Phoenix++). By evaluating six real-world applications, the experimental results show that our optimized framework is 1.2X to 38X faster than Phoenix++ for various applications on the Xeon Phi.

Tags: Benchmarking, Computer science, Intel Phi, MapReduce

September 4, 2013 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor

Share this:

Recent source codes

Most viewed papers (last 30 days)