high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Evaluation of streaming aggregation on parallel hardware architectures

Evaluation of streaming aggregation on parallel hardware architectures

Scott Schneidert, Henrique Andrade, Bugra Gedik, Kun-Lung Wu, Dimitrios S. Nikolopoulos

Virginia Tech. Department of Computer Science, Blacksburg, VA, USA

Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems, DEBS ’10, 2010

DOI:10.1145/1827418.1827467

BibTeX

Download (PDF)

View

Source

1831

views

We present a case study parallelizing streaming aggregation on three different parallel hardware architectures. Aggregation is a performance-critical operation for data summarization in stream computing, and is commonly found in sense-and-respond applications. Currently available commodity parallel hardware provides promise as accelerators for streaming aggregation. However, how streaming aggregation can map to the different parallel architectures is still an open question. Streaming aggregation is obviously data parallel, but in practice its performance relies more on efficient data movement than computation, as we will demonstrate. Furthermore, we used workloads such as stock market data, which introduces unique data distribution problems. The three parallel architectures we use in our study are an Intel Core 2 Quad processor, an Nvidia GTX 285 GPU and the IBM PowerXCell 8i, an enhanced version of the Cell Broadband Engine architecture. Our implementations use OpenMP, CUDA and Cellgen (a compiler for OpenMP-like support on Cell) respectively. We find that the Cell’s programmable local storage, and its low latency, high bandwidth access to main memory are best suited for parallelizing streaming aggregation. GPUs in the future can overcome the latency and bandwidth limitations by being fully integrated in the system’s memory hierarchy. In order to attain good performance on existing parallel architectures, we find that developers must characterize their problem in terms of communication versus computation costs; memory access patterns, including assessing whether their algorithms reuse data; and the granularity of data access patterns.

Tags: Cell processor, Computer science, CUDA, nVidia, nVidia GeForce GTX 285, OpenMP, Performance

August 22, 2011 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Evaluation of streaming aggregation on parallel hardware architectures

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Evaluation of streaming aggregation on parallel hardware architectures

Share this:

Recent source codes

Most viewed papers (last 30 days)