GPU Acceleration of Graph Matching, Clustering, and Partitioning

hgpu.org » Programming » Algorithms » GPU Acceleration of Graph Matching, Clustering, and Partitioning

GPU Acceleration of Graph Matching, Clustering, and Partitioning

Bastiaan Onne Fagginger Auer

Utrecht University

Utrecht University, 2013

BibTeX

Download (PDF)

View

Source

2685

views

We consider sequential algorithms for hypergraph partitioning and GPU (i.e., fine-grained shared-memory parallel) algorithms for graph partitioning and clustering. Our investigation into sequential hypergraph partitioning is concerned with the efficient construction of high-quality matchings for hypergraph coarsening and optimisation with respect to general hypergraph partitioning quality metrics. We introduce the l*(l-1)-metric which exactly measures the communication volume for a finite element computation, and show how to use an ordinary hypergraph bipartitioner to greedily optimise a partitioning with respect to a general quality metric. Graph partitioning and clustering on the GPU is achieved by implementing all parts of the multi-level paradigm (i.e., matching, coarsening, and refinement) on the GPU. We first develop GPU algorithms for matching and coarsening. These are then used as building blocks for a greedy agglomerative modularity clustering heuristic, with which we participated in the 10th DIMACS partitioning and clustering challenge. By combining the parallel matching and coarsening algorithms with a parallel partitioning refinement method and implementing these algorithms using general sparse matrix-vector multiplication operations, we are able to perform graph partitioning entirely on the GPU. The GPU partitioning algorithm is compared both in terms of quality and speed to the sequential METIS graph partitioner and is faster for graphs with a million or more vertices, while offering similar quality. The highest achieved speedup over METIS is 6.2, for which a graph with 24 million vertices and 29 million edges is partitioned into two parts in 3.7 seconds on the GPU (an NVIDIA Tesla C2075) with an edge cut of 329. This shows that the GPU can effectively be used for the multi-level analysis of large graphs.

Tags: Algorithms, Clustering, Computer science, CUDA, Graph theory, nVidia, Sparse matrix, Tesla C2075, Thesis

August 6, 2013 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org