High quality graph-based similarity search


SimRank is an influential link-based similarity measure that has been used in many fields of Web search and sociometry. The best-of-breed method by Kusumoto et. al., however, does not always deliver high-quality results, since it fails to accurately obtain its diagonal correction matrix D. Besides, SimRank is also limited by an unwanted "connectivity trait": increasing the number of paths between nodes a and b often incurs a decrease in score s(a,b). The best-known solution, SimRank++, cannot resolve this problem, since a revised score will be zero if a and b have no common in-neighbors. In this paper, we consider high-quality similarity search. Our scheme, SR#, is efficient and semantically meaningful: (1) We first formulate the exact D, and devise a "varied-D" method to accurately compute SimRank in linear memory. Moreover, by grouping computation, we also reduce the time of from quadratic to linear in the number of iterations. (2) We design a "kernel-based" model to improve the quality of SimRank, and circumvent the "connectivity trait" issue. (3) We give mathematical insights to the semantic difference between SimRank and its variant, and correct an argument: "if D is replaced by a scaled identity matrix, top-K rankings will not be affected much". The experiments confirm that SR# can accurately extract high-quality scores, and is much faster than the state-of-the-art competitors.

Publication DOI: https://doi.org/10.1145/2766462.2767720
Divisions: Engineering & Applied Sciences
Engineering & Applied Sciences > Systems analytics research institute (SARI)
Event Title: 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
Event Type: Other
Event Dates: 2015-08-06 - 2015-08-13
Uncontrolled Keywords: link analysis,graph-based similarity,high quality search
ISBN: 978-1-4503-3621-5
PURE Output Type: Conference contribution
Published Date: 2015-08-09
Authors: Yu, Weiren
McCann, Julie Ann



Version: Accepted Version

Export / Share Citation


Additional statistics for this record