A Yoke of Oxen and a Thousand Chickens for Heavy Lifting Graph Processing

hgpu.org » Applications » Computer science » A Yoke of Oxen and a Thousand Chickens for Heavy Lifting Graph Processing

A Yoke of Oxen and a Thousand Chickens for Heavy Lifting Graph Processing

Abdullah Gharaibeh, Lauro Beltrao Costa, Elizeu Santos-Neto, Matei Ripeanu

Department of Electrical and Computer Engineering, The University of British Columbia

IEEE/ACM International Conference on Parallel Architectures and Compilation Techniques (PACT 2012), 2012

BibTeX

Download (PDF)

View

Source

2046

views

Large, real-world graphs are famously difficult to process efficiently. Not only they have a large memory footprint but most graph processing algorithms entail memory access patterns with poor locality, data-dependent parallelism, and a low compute-to-memory access ratio. Additionally, most real-world graphs have a low diameter and a highly heterogeneous node degree distribution. Partitioning these graphs and simultaneously achieve access locality and load-balancing is difficult if not impossible. This paper demonstrates the feasibility of graph processing on heterogeneous (i.e., including both CPUs and GPUs) platforms as a cost-effective approach towards addressing the graph processing challenges above. To this end, this work (i) presents and evaluates a performance model that estimates the achievable performance on heterogeneous platforms; (ii) introduces TOTEM – a processing engine based on the Bulk Synchronous Parallel (BSP) model that offers a convenient environment to simplify the implementation of graph algorithms on heterogeneous platforms; and, (iii) demonstrates TOTEM’S efficiency by implementing and evaluating two graph algorithms (PageRank and breadth-first search). TOTEM achieves speedups close to the model’s prediction, and applies a number of optimizations that enable linear speedups with respect to the share of the graph offloaded for processing to accelerators.

Tags: Computer science, CUDA, Graph theory, Heterogeneous systems, nVidia, Optimization, Performance, Programming Languages, Programming techniques, Tesla C2050

July 16, 2012 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

A Yoke of Oxen and a Thousand Chickens for Heavy Lifting Graph Processing

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

A Yoke of Oxen and a Thousand Chickens for Heavy Lifting Graph Processing

Share this:

Recent source codes

Most viewed papers (last 30 days)