A Yoke of Oxen and a Thousand Chickens for Heavy Lifting Graph Processing
Department of Electrical and Computer Engineering, The University of British Columbia
IEEE/ACM International Conference on Parallel Architectures and Compilation Techniques (PACT 2012), 2012
@article{gharaibeh2012totem,
title={TOTEM: Graph Processing on Heterogeneous CPU and GPU Platforms},
author={Gharaibeh, A. and Santos-Neto, E. and Costa, L.B. and Ripeanu, M.},
year={2012}
}
Large, real-world graphs are famously difficult to process efficiently. Not only they have a large memory footprint but most graph processing algorithms entail memory access patterns with poor locality, data-dependent parallelism, and a low compute-to-memory access ratio. Additionally, most real-world graphs have a low diameter and a highly heterogeneous node degree distribution. Partitioning these graphs and simultaneously achieve access locality and load-balancing is difficult if not impossible. This paper demonstrates the feasibility of graph processing on heterogeneous (i.e., including both CPUs and GPUs) platforms as a cost-effective approach towards addressing the graph processing challenges above. To this end, this work (i) presents and evaluates a performance model that estimates the achievable performance on heterogeneous platforms; (ii) introduces TOTEM – a processing engine based on the Bulk Synchronous Parallel (BSP) model that offers a convenient environment to simplify the implementation of graph algorithms on heterogeneous platforms; and, (iii) demonstrates TOTEM’S efficiency by implementing and evaluating two graph algorithms (PageRank and breadth-first search). TOTEM achieves speedups close to the model’s prediction, and applies a number of optimizations that enable linear speedups with respect to the share of the graph offloaded for processing to accelerators.
July 16, 2012 by hgpu
Your response
You must be logged in to post a comment.