high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Adaptation of the MapReduce programming framework to compute-intensive data-analytics kernels

Adaptation of the MapReduce programming framework to compute-intensive data-analytics kernels

Reza Farivar

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign, 2013

@article{campbell2013adaptation,

title={Adaptation of the MapReduce programming framework to compute-intensive data-analytics kernels},

author={Campbell, R.H. and Sanders, W. and Hwu, W.M. and Raghunathan, A.},

year={2013}

}

Download (PDF)

View

Source

2963

views

Compute-intensive data-analytic (CIDA) applications have become a major component of many different business domains, as well as scientific computing applications. These algorithms stem from domains as diverse as web analysis and social networks, machine learning and data mining, text analysis, bio-informatics, astronomy image analysis, business analytics, large scale graph algorithms, image/video processing and recognition, some high performance computing problems, quantitative finance and simulation among others. These computational problems deal with massive data sets, and require performing lots of computation per data element. This thesis presents a vision of CIDA applications programmed in a MapReduce style framework and running on clusters of accelerators. Regardless of the type of accelerator, whether GPUs (NVIDIA or AMD), other manycore architectures (like Intel Larrabee or MIC) or even heterogeneous chips (AMD Fusion or IBM Cell processor), there is a fundamental condition imposed on the software, namely the increased sensitivity to locality. As a result, the common theme in this thesis is to increase the locality in CIDA applications. We report on four research efforts to achieve this goal. The Multiple independent threads on a heterogeneous resource architecture (MITHRA) project integrates Hadoop MapReduce and GPUs together, where the map() functions execute. As a result, by moving the map() functions to GPUs we increase the locality of reference and gain better performance. We have shown that when the MITHRA model is applicable (for instance for Monte Carlo algorithms), each computing node can perform orders of magnitude more work in the same run-time. Then we introduce partitioned iterative convergence (PIC) as an approach to realize iterative algorithms on clusters. We observed that conventional implementations of iterative algorithms using MapReduce are quite inefficient as a result of several factors. Complementary to prior work, we focused on addressing the challenges of high network traffic due to frequent model updates and lack of parallelism across iterations. PIC has two phases. In the first phase, called the best-effort phase, it partitions the problem and runs the sub-problems in individual cluster nodes, where the locality can be exploited better. The results of this phase can be numerically inaccurate (about 3% based on experimental results), but can be computed much faster. The second phase of PIC, called the top-off phase, runs the original iterative algorithm a few more iterations (starting with the results of the best-effort phase) to compute an accurate answer. Finally we introduce two GPU-based projects that try to increase the performance of MapReduce style functions in GPUs. The first is loop maximizing, a code transformation for GPUs that can eliminate code flow divergence (and hence serialization in GPUs) and result in better usage of GPU processing elements. Using this technique, we have achieved the highest reported speedups for gene alignment algorithms. The second GPU-based project is a library for dynamic shared memory allocation and access in GPUs assuming independent execution of the GPU threads, which happens in a MapReduce style environment among both map() and reduce() functions. The two MapReduce adaptations (MITHRA and PIC), the GPU-based loop-maximizing optimization and the Plasma library together lay the plan for the goal of achieving good performance on locality-sensitive clusters. This thesis shows the feasibility of this approach, and describes how each of these projects contributes to the collective target.

Tags: Computer science, CUDA, Heterogeneous systems, MapReduce, nVidia, nVidia GeForce GTX 275, Thesis

February 9, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Adaptation of the MapReduce programming framework to compute-intensive data-analytics kernels

Your response

Recent source codes

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

BoltzGen:Toward Universal Binder Design

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

TritonForge: Transform PyTorch Operations into Optimized GPU Kernels with LLMs

RLTune: Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

Most viewed papers (last 30 days)

Adaptation of the MapReduce programming framework to compute-intensive data-analytics kernels

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)