Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems

Abdullah Gharaibeh, Elizeu Santos-Neto, Lauro Beltrao Costa, Matei Ripeanu
The University of British Columbia
arXiv:1312.3018 [cs.DC], (11 Dec 2013)


   author={Gharaibeh}, A. and {Santos-Neto}, E. and {Beltrao Costa}, L. and {Ripeanu}, M.},

   title={"{Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems}"},

   journal={ArXiv e-prints},




   keywords={Computer Science – Distributed, Parallel, and Cluster Computing},




   adsnote={Provided by the SAO/NASA Astrophysics Data System}


Download Download (PDF)   View View   Source Source   Source codes Source codes




The increasing scale and wealth of inter-connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable knowledge from large-scale graphs. However, real-world graphs are famously difficult to process efficiently. Not only they have a large memory footprint, but also most graph algorithms entail memory access patterns with poor locality, data-dependent parallelism and a low compute-to-memory access ratio. Moreover, most real-world graphs have a highly heterogeneous node degree distribution, hence partitioning these graphs for parallel processing and simultaneously achieving access locality and load-balancing is difficult. This work starts from the hypothesis that hybrid platforms (e.g., GPU-accelerated systems) have both the potential to cope with the heterogeneous structure of real graphs and to offer a cost-effective platform for high-performance graph processing. This work assesses this hypothesis and presents an extensive exploration of the opportunity to harness hybrid systems to process large-scale graphs efficiently. In particular, (i) we present a performance model that estimates the achievable performance on hybrid platforms; (ii) informed by the performance model, we design and develop TOTEM – a processing engine that provides a convenient environment to implement graph algorithms on hybrid platforms; (iii) we show that further performance gains can be extracted using partitioning strategies that aim to produce partitions that each matches the strengths of the processing element it is allocated to, finally, (iv) we demonstrate the performance advantages of the hybrid system through a comprehensive evaluation that uses real and synthetic workloads (as large as 16 billion edges), multiple graph algorithms that stress the system in various ways, and a variety of hardware configurations.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2019 hgpu.org

All rights belong to the respective authors

Contact us: