Clustering coefficient queries on massive dynamic social networks
IBM Research – China
In Proceedings of the 11th international conference on Web-age information management (2010), pp. 115-126.
@conference{liu2010clustering,
title={Clustering coefficient queries on massive dynamic social networks},
author={Liu, Z. and Wang, C. and Zou, Q. and Wang, H.},
booktitle={Proceedings of the 11th international conference on Web-age information management},
pages={115–126},
isbn={3642142451},
year={2010},
organization={Springer-Verlag}
}
The Clustering Coefficient (CC) is a fundamental measure in social network analysis assessing the degree to which nodes tend to cluster together. While CC computation on static graphs is well studied, emerging applications have new requirements for online query of the “global” CC of a given subset of a graph. As social networks are widely stored in databases for easy updating and accessing, computing CC of their subset becomes a time-consuming task, especially when the network grows large and cannot fit in memory. This paper presents a novel method called “Approximate Neighborhood Index (ANI)” to significantly reduce the query latency for CC computation compared to traditional SQL based database queries. A Bloom-filter-like data structure is leveraged to construct ANI in front of a relational database. Experimental results show that the proposed approach can guarantee the correctness of a CC query while significantly reducing the query latency at a reasonable memory cost.
April 11, 2011 by hgpu