high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Map-Reduce-Like System for Programming and Optimizing Data-Intensive Computations on Emerging Parallel Architectures

A Map-Reduce-Like System for Programming and Optimizing Data-Intensive Computations on Emerging Parallel Architectures

Wei Jiang

The Ohio State University

The Ohio State University, 2012

@phdthesis{jiang2012map,

title={A Map-Reduce-Like System for Programming and Optimizing Data-Intensive Computations on Emerging Parallel Architectures},

author={Jiang, W.},

year={2012},

school={The Ohio State University}

}

Download (PDF)

View

Source

2232

views

Parallel computing environments are ubiquitous nowadays, including traditional CPU clusters and the emergence of GPU clusters and CPU-GPU clusters because of their performance, cost and energy efficiency. With this trend, an important research issue is to effectively utilize the massive computing power in these architectures to accelerate data-intensive applications arising from commercial and scientific domains. Map-reduce and its Hadoop implementation have become popular for its high programming productivity but exhibits non-trivial performance losses for many classes of data-intensive applications. Also, there is no general map-reduce-like support up to date for programming heterogeneous systems like a CPU-GPU cluster. Besides, it is widely accepted that the existing fault tolerant techniques for high-end systems will not be feasible in the exascale era and novel solutions are clearly needed. Our overall goal is to solve these programmability and performance issues by providing a map-reduce-like API with better performance efficiency as well as efficient fault tolerance support, targeting data-intensive applications and various new emerging parallel architectures. We believe that a map-reduce-like API can ease the programming difficulty in these parallel architectures, and more importantly improve the performance efficiency of parallelizing these data-intensive applications. Also, the use of a high-level programming model can greatly simplify fault-tolerance support, resulting in low overhead checkpointing and recovery. We performed a comparative study showing that the map-reduce processing style could cause significant overheads for a set of data mining applications. Based on the observation, we developed a map-reduce system with an alternate API (MATE) using a user-declared reduction-object to be able to further improve the performance of map-reduce programs in multi-core environments. To address the limitation in MATE that the reduction object must fit in memory, we extended the MATE system to support the reduction object of arbitrary sizes in distributed environments and apply it to a set of graphmining applications, obtaining better performance than the original graph mining library based on map-reduce. We then supported the generalized reduction API in a CPU-GPU cluster with the ability of automatic data distribution among CPUs and GPUs to achieve the best-possible heterogeneous execution of iterative applications. Finally, in our recent work, we extended the generalized reduction model with supporting low overhead fault tolerance for MPI programs in our FT-MATE system. Especially, we are able to deal with CPU/GPU failures in a cluster with low overhead checkpointing, and restart the computations from a different number of nodes. Through this work, we would like to provide useful insights for designing and implementing efficient fault tolerance solutions for the exascale systems in the future.

Tags: Computer science, CUDA, Data mining, GPU cluster, Heterogeneous systems, MapReduce, MPI, nVidia, Tesla C2050, Thesis

September 1, 2012 by hgpu

Rating: 2.0/5. From 5 votes.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

A Map-Reduce-Like System for Programming and Optimizing Data-Intensive Computations on Emerging Parallel Architectures

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

A Map-Reduce-Like System for Programming and Optimizing Data-Intensive Computations on Emerging Parallel Architectures

Share this:

Recent source codes

Most viewed papers (last 30 days)