Adaptation of the MapReduce programming framework to compute-intensive data-analytics kernels

Reza Farivar
University of Illinois at Urbana-Champaign
University of Illinois at Urbana-Champaign, 2013

   title={Adaptation of the MapReduce programming framework to compute-intensive data-analytics kernels},

   author={Campbell, R.H. and Sanders, W. and Hwu, W.M. and Raghunathan, A.},



Download Download (PDF)   View View   Source Source   



Compute-intensive data-analytic (CIDA) applications have become a major component of many different business domains, as well as scientific computing applications. These algorithms stem from domains as diverse as web analysis and social networks, machine learning and data mining, text analysis, bio-informatics, astronomy image analysis, business analytics, large scale graph algorithms, image/video processing and recognition, some high performance computing problems, quantitative finance and simulation among others. These computational problems deal with massive data sets, and require performing lots of computation per data element. This thesis presents a vision of CIDA applications programmed in a MapReduce style framework and running on clusters of accelerators. Regardless of the type of accelerator, whether GPUs (NVIDIA or AMD), other manycore architectures (like Intel Larrabee or MIC) or even heterogeneous chips (AMD Fusion or IBM Cell processor), there is a fundamental condition imposed on the software, namely the increased sensitivity to locality. As a result, the common theme in this thesis is to increase the locality in CIDA applications. We report on four research efforts to achieve this goal. The Multiple independent threads on a heterogeneous resource architecture (MITHRA) project integrates Hadoop MapReduce and GPUs together, where the map() functions execute. As a result, by moving the map() functions to GPUs we increase the locality of reference and gain better performance. We have shown that when the MITHRA model is applicable (for instance for Monte Carlo algorithms), each computing node can perform orders of magnitude more work in the same run-time. Then we introduce partitioned iterative convergence (PIC) as an approach to realize iterative algorithms on clusters. We observed that conventional implementations of iterative algorithms using MapReduce are quite inefficient as a result of several factors. Complementary to prior work, we focused on addressing the challenges of high network traffic due to frequent model updates and lack of parallelism across iterations. PIC has two phases. In the first phase, called the best-effort phase, it partitions the problem and runs the sub-problems in individual cluster nodes, where the locality can be exploited better. The results of this phase can be numerically inaccurate (about 3% based on experimental results), but can be computed much faster. The second phase of PIC, called the top-off phase, runs the original iterative algorithm a few more iterations (starting with the results of the best-effort phase) to compute an accurate answer. Finally we introduce two GPU-based projects that try to increase the performance of MapReduce style functions in GPUs. The first is loop maximizing, a code transformation for GPUs that can eliminate code flow divergence (and hence serialization in GPUs) and result in better usage of GPU processing elements. Using this technique, we have achieved the highest reported speedups for gene alignment algorithms. The second GPU-based project is a library for dynamic shared memory allocation and access in GPUs assuming independent execution of the GPU threads, which happens in a MapReduce style environment among both map() and reduce() functions. The two MapReduce adaptations (MITHRA and PIC), the GPU-based loop-maximizing optimization and the Plasma library together lay the plan for the goal of achieving good performance on locality-sensitive clusters. This thesis shows the feasibility of this approach, and describes how each of these projects contributes to the collective target.
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

* * *

* * *

Follow us on Twitter

HGPU group

1662 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

337 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: