10472

Fast Detection of Overlapping Communities via Online Tensor Methods on GPUs

Furong Huang, Niranjan U N, Mohammad Umar Hakeem, Prateek Verma, Animashree Anandkumar
Electrical Engineering and Computer Science Dept., University of California, Irvine, USA 92697
arXiv:1309.0787 [cs.LG], (3 Sep 2013)

@article{2013arXiv1309.0787H,

   author={Huang}, F. and {N}, N.~U and {Umar Hakeem}, M. and {Verma}, P. and {Anandkumar}, A.},

   title={"{Fast Detection of Overlapping Communities via Online Tensor Methods on GPUs}"},

   journal={ArXiv e-prints},

   archivePrefix={"arXiv"},

   eprint={1309.0787},

   primaryClass={"cs.LG"},

   keywords={Computer Science – Learning, Computer Science – Distributed, Parallel, and Cluster Computing, Computer Science – Social and Information Networks, Statistics – Machine Learning},

   year={2013},

   month={sep},

   adsurl={http://adsabs.harvard.edu/abs/2013arXiv1309.0787H},

   adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download Download (PDF)   View View   Source Source   

2182

views

We present a scalable tensor-based approach for detecting hidden overlapping communities under the mixed membership stochastic block model. We employ stochastic gradient descent for performing tensor decompositions, which provides flexibility to tradeoff node sub-sampling with accuracy. Our GPU implementation of the tensor-based approach is extremely fast and scalable, and involves a careful optimization of GPU-CPU storage and communication. We validate our results on datasets from popular social networks (Facebook, Yelp and DBLP), where ground truth is available, using notions of $p$-values and false discovery rates, and obtain high accuracy for membership recovery. We compare our results, both in terms of execution time and accuracy, to the state-of-the-art algorithms such as the variational method, and report better performance. For instance, on the Yelp network consisting of about 40,000 nodes and 500 communities, we recover the latent communities in under 30 minutes, and on the DBLP network consisting of about 120,000 nodes and 500 communities, we recover the latent communities in about 2.8 hours. In comparison, the variational method takes more than an order of magnitude higher execution time on the same datasets.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: