Accelerating MapReduce on a coupled CPU-GPU architecture

hgpu.org » Applications » Computer science » Accelerating MapReduce on a coupled CPU-GPU architecture

Accelerating MapReduce on a coupled CPU-GPU architecture

Linchuan Chen, Xin Huo, Gagan Agrawal

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210

International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’12), 2012

@inproceedings{chen2012accelerating,

title={Accelerating MapReduce on a coupled CPU-GPU architecture},

author={Chen, L. and Huo, X. and Agrawal, G.},

booktitle={Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis},

pages={25},

year={2012},

organization={IEEE Computer Society Press}

}

Download (PDF)

View

Source

3614

views

The work presented here is driven by two observations. First, heterogeneous architectures that integrate a CPU and a GPU on the same chip are emerging, and hold much promise for supporting power-efficient and scalable high performance computing. Second, MapReduce has emerged as a suitable framework for simplified parallel application development for many classes of applications, including data mining and machine learning applications that benefit from accelerators. This paper focuses on the challenge of scaling a MapReduce application using the CPU and GPU together in an integrated architecture. We develop different methods for dividing the work, which are the map-dividing scheme, where map tasks are divided between both devices, and the pipelining scheme, which pipelines the map and the reduce stages on different devices. We develop dynamic work distribution schemes for both the approaches. To achieve high load balance while keeping scheduling costs low, we use a runtime tuning method to adjust task block sizes for the map-dividing scheme. Our implementation of MapReduce is based on a continuous reduction method, which avoids the memory overheads of storing key-value pairs. We have evaluated the different design decisions using 5 popular MapReduce applications. For 4 of the applications, our system achieves 1.21 to 2.1 speedup over the better of the CPU-only and GPU-only versions. The speedups over a single CPU core execution range from 3.25 to 28.68. The runtime tuning method we have developed achieves very low load imbalance, while keeping scheduling overheads low. Though our current work is specific to MapReduce, many underlying ideas are also applicable towards intra-node acceleration of other applications on integrated CPU-GPU nodes.

Tags: APU, ATI, ATI Radeon HD 6550, Computer science, Data mining, Heterogeneous systems, Machine learning, MapReduce, OpenCL

November 20, 2012 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org