10243

GPU Acceleration of Graph Matching, Clustering, and Partitioning

Bastiaan Onne Fagginger Auer
Utrecht University
Utrecht University, 2013

@article{auer2013gpu,

   title={GPU Acceleration of Graph Matching, Clustering, and Partitioning},

   author={Auer, BO Fagginger},

   year={2013},

   publisher={Utrecht University}

}

Download Download (PDF)   View View   Source Source   

2477

views

We consider sequential algorithms for hypergraph partitioning and GPU (i.e., fine-grained shared-memory parallel) algorithms for graph partitioning and clustering. Our investigation into sequential hypergraph partitioning is concerned with the efficient construction of high-quality matchings for hypergraph coarsening and optimisation with respect to general hypergraph partitioning quality metrics. We introduce the l*(l-1)-metric which exactly measures the communication volume for a finite element computation, and show how to use an ordinary hypergraph bipartitioner to greedily optimise a partitioning with respect to a general quality metric. Graph partitioning and clustering on the GPU is achieved by implementing all parts of the multi-level paradigm (i.e., matching, coarsening, and refinement) on the GPU. We first develop GPU algorithms for matching and coarsening. These are then used as building blocks for a greedy agglomerative modularity clustering heuristic, with which we participated in the 10th DIMACS partitioning and clustering challenge. By combining the parallel matching and coarsening algorithms with a parallel partitioning refinement method and implementing these algorithms using general sparse matrix-vector multiplication operations, we are able to perform graph partitioning entirely on the GPU. The GPU partitioning algorithm is compared both in terms of quality and speed to the sequential METIS graph partitioner and is faster for graphs with a million or more vertices, while offering similar quality. The highest achieved speedup over METIS is 6.2, for which a graph with 24 million vertices and 29 million edges is partitioned into two parts in 3.7 seconds on the GPU (an NVIDIA Tesla C2075) with an edge cut of 329. This shows that the GPU can effectively be used for the multi-level analysis of large graphs.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: