Vispark: GPU-Accelerated Distributed Visual Computing Using Spark
School of Electrical and Computer Engineering, Ulsan National Institute of Science and Technology, Ulsan, Republic of Korea
SIAM Journal on Scientific Computing (SISC), 2016
@article{woohyuk2016vispark,
author={Woohyuk Choi and Sumin Hong and Won-Ki Jeong},
title={Vispark: GPU-Accelerated Distributed Visual Computing Using Spark},
journal={SIAM Journal on Scientific Computing (SISC)},
publisher={Society for Industrial and Applied Mathematics},
year={2016}
}
With the growing need of big-data processing in diverse application domains, MapReduce (e.g., Hadoop) has become one of the standard computing paradigms for large-scale computing on a cluster system. Despite its popularity, the current MapReduce framework suffers from inflexibility and inefficiency inherent to its programming model and system architecture. In order to address these problems, we propose Vispark, a novel extension of Spark [26] for GPU-accelerated MapReduce processing on array-based scientific computing and image processing tasks. Vispark provides an easy-to-use, Python-like high-level language syntax and a novel data abstraction for MapReduce programming on a GPU cluster system. Vispark introduces a programming abstraction for accessing neighbor data in the mapper function, which greatly simplifies many image processing tasks using MapReduce by reducing memory footprints and bypassing the reduce stage. Vispark provides socket-based halo communication that synchronizes between data partitions transparently from the users, which is necessary for many scientific computing problems in distributed systems. Vispark also provides domain-specific functions and language supports specifically designed for high-performance computing and image processing applications. We demonstrate the performance of our prototype system on several visual computing tasks, such as image processing, volume rendering, K-means clustering, and heat transfer simulation.
November 8, 2016 by hgpu