high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Map-Reduce-Like System for Programming and Optimizing Data-Intensive Computations on Emerging Parallel Architectures

A Map-Reduce-Like System for Programming and Optimizing Data-Intensive Computations on Emerging Parallel Architectures

Wei Jiang

The Ohio State University

The Ohio State University, 2012

@phdthesis{jiang2012map,

title={A Map-Reduce-Like System for Programming and Optimizing Data-Intensive Computations on Emerging Parallel Architectures},

author={Jiang, W.},

year={2012},

school={The Ohio State University}

}

Download (PDF)

View

Source

2855

views

Parallel computing environments are ubiquitous nowadays, including traditional CPU clusters and the emergence of GPU clusters and CPU-GPU clusters because of their performance, cost and energy efficiency. With this trend, an important research issue is to effectively utilize the massive computing power in these architectures to accelerate data-intensive applications arising from commercial and scientific domains. Map-reduce and its Hadoop implementation have become popular for its high programming productivity but exhibits non-trivial performance losses for many classes of data-intensive applications. Also, there is no general map-reduce-like support up to date for programming heterogeneous systems like a CPU-GPU cluster. Besides, it is widely accepted that the existing fault tolerant techniques for high-end systems will not be feasible in the exascale era and novel solutions are clearly needed. Our overall goal is to solve these programmability and performance issues by providing a map-reduce-like API with better performance efficiency as well as efficient fault tolerance support, targeting data-intensive applications and various new emerging parallel architectures. We believe that a map-reduce-like API can ease the programming difficulty in these parallel architectures, and more importantly improve the performance efficiency of parallelizing these data-intensive applications. Also, the use of a high-level programming model can greatly simplify fault-tolerance support, resulting in low overhead checkpointing and recovery. We performed a comparative study showing that the map-reduce processing style could cause significant overheads for a set of data mining applications. Based on the observation, we developed a map-reduce system with an alternate API (MATE) using a user-declared reduction-object to be able to further improve the performance of map-reduce programs in multi-core environments. To address the limitation in MATE that the reduction object must fit in memory, we extended the MATE system to support the reduction object of arbitrary sizes in distributed environments and apply it to a set of graphmining applications, obtaining better performance than the original graph mining library based on map-reduce. We then supported the generalized reduction API in a CPU-GPU cluster with the ability of automatic data distribution among CPUs and GPUs to achieve the best-possible heterogeneous execution of iterative applications. Finally, in our recent work, we extended the generalized reduction model with supporting low overhead fault tolerance for MPI programs in our FT-MATE system. Especially, we are able to deal with CPU/GPU failures in a cluster with low overhead checkpointing, and restart the computations from a different number of nodes. Through this work, we would like to provide useful insights for designing and implementing efficient fault tolerance solutions for the exascale systems in the future.

Tags: Computer science, CUDA, Data mining, GPU cluster, Heterogeneous systems, MapReduce, MPI, nVidia, Tesla C2050, Thesis

September 1, 2012 by hgpu

Rating: 2.0/5. From 5 votes.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

A Map-Reduce-Like System for Programming and Optimizing Data-Intensive Computations on Emerging Parallel Architectures

Your response

Recent source codes

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

BoltzGen:Toward Universal Binder Design

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

TritonForge: Transform PyTorch Operations into Optimized GPU Kernels with LLMs

RLTune: Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

Most viewed papers (last 30 days)

A Map-Reduce-Like System for Programming and Optimizing Data-Intensive Computations on Emerging Parallel Architectures

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)