Evaluation of streaming aggregation on parallel hardware architectures

hgpu.org » Applications » Computer science » Evaluation of streaming aggregation on parallel hardware architectures

Evaluation of streaming aggregation on parallel hardware architectures

Scott Schneidert, Henrique Andrade, Bugra Gedik, Kun-Lung Wu, Dimitrios S. Nikolopoulos

Virginia Tech. Department of Computer Science, Blacksburg, VA, USA

Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems, DEBS ’10, 2010

DOI:10.1145/1827418.1827467

@inproceedings{schneidert2010evaluation,

title={Evaluation of streaming aggregation on parallel hardware architectures},

author={Schneidert, S. and Andrade, H. and Gedik, B. and Wu, K.L. and Nikolopoulos, D.S.},

booktitle={Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems},

pages={248–257},

year={2010},

organization={ACM}

}

Download (PDF)

View

Source

1448

views

We present a case study parallelizing streaming aggregation on three different parallel hardware architectures. Aggregation is a performance-critical operation for data summarization in stream computing, and is commonly found in sense-and-respond applications. Currently available commodity parallel hardware provides promise as accelerators for streaming aggregation. However, how streaming aggregation can map to the different parallel architectures is still an open question. Streaming aggregation is obviously data parallel, but in practice its performance relies more on efficient data movement than computation, as we will demonstrate. Furthermore, we used workloads such as stock market data, which introduces unique data distribution problems. The three parallel architectures we use in our study are an Intel Core 2 Quad processor, an Nvidia GTX 285 GPU and the IBM PowerXCell 8i, an enhanced version of the Cell Broadband Engine architecture. Our implementations use OpenMP, CUDA and Cellgen (a compiler for OpenMP-like support on Cell) respectively. We find that the Cell’s programmable local storage, and its low latency, high bandwidth access to main memory are best suited for parallelizing streaming aggregation. GPUs in the future can overcome the latency and bandwidth limitations by being fully integrated in the system’s memory hierarchy. In order to attain good performance on existing parallel architectures, we find that developers must characterize their problem in terms of communication versus computation costs; memory access patterns, including assessing whether their algorithms reuse data; and the granularity of data access patterns.

Tags: Cell processor, Computer science, CUDA, nVidia, nVidia GeForce GTX 285, OpenMP, Performance

August 22, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org