Accelerating MapReduce on a coupled CPU-GPU architecture

Linchuan Chen, Xin Huo, Gagan Agrawal
Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210
International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’12), 2012


   title={Accelerating MapReduce on a coupled CPU-GPU architecture},

   author={Chen, L. and Huo, X. and Agrawal, G.},

   booktitle={Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis},



   organization={IEEE Computer Society Press}


Download Download (PDF)   View View   Source Source   



The work presented here is driven by two observations. First, heterogeneous architectures that integrate a CPU and a GPU on the same chip are emerging, and hold much promise for supporting power-efficient and scalable high performance computing. Second, MapReduce has emerged as a suitable framework for simplified parallel application development for many classes of applications, including data mining and machine learning applications that benefit from accelerators. This paper focuses on the challenge of scaling a MapReduce application using the CPU and GPU together in an integrated architecture. We develop different methods for dividing the work, which are the map-dividing scheme, where map tasks are divided between both devices, and the pipelining scheme, which pipelines the map and the reduce stages on different devices. We develop dynamic work distribution schemes for both the approaches. To achieve high load balance while keeping scheduling costs low, we use a runtime tuning method to adjust task block sizes for the map-dividing scheme. Our implementation of MapReduce is based on a continuous reduction method, which avoids the memory overheads of storing key-value pairs. We have evaluated the different design decisions using 5 popular MapReduce applications. For 4 of the applications, our system achieves 1.21 to 2.1 speedup over the better of the CPU-only and GPU-only versions. The speedups over a single CPU core execution range from 3.25 to 28.68. The runtime tuning method we have developed achieves very low load imbalance, while keeping scheduling overheads low. Though our current work is specific to MapReduce, many underlying ideas are also applicable towards intra-node acceleration of other applications on integrated CPU-GPU nodes.
Rating: 2.5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: