Similarity Search over Network Structure

Abstract

With the advent of the Internet, graph-structured data are ubiquitous. An essential task for graph-structured data management is similarity search based on graph topology, with a wide spectrum of applications, e.g., web search, outlier detection, co-citation analysis, and collaborative filtering. These graph topology data arrive from multiple sources at an astounding velocity, volume and veracity. While the scale of network structured data is increasing, existing similarity search algorithms on large graphs are impractical due to their expensive costs in terms of computational time and memory space. Moreover, dynamic changes (e.g., noise and abnormality) exists in network data, and it arises from many factors, such as data loss in transfer, data incompleteness, and dirty reading. Thus, the dynamic changes have become the main barrier to gaining accurate results for efficient network analysis. In real Web applications, CoSimRank has been proposed as a robust measure of node-pair similarity based on graph topology. It follows a SimRank-like notion that “two nodes are considered as similar if their in-neighbours are similar”, but the similarity of each node with itself is not constantly 1, which is different from SimRank. However, existing work on CoSimRank is restricted to static graphs. Each node pair CoSimRank score is retrieved from the sum of dot products of two Personalised PageRank vectors. When the graph is updated with edges (nodes) addition and deletion over time, it is cost-inhibitive to recompute all CoSimRank scores from scratch, which is impractical. RoleSim is a popular graph-structural role similarity search measure with many applications (e.g., sociometry), it can get the automorphic equivalence of nodes pair similarity, which SimRank and CoSimRank lack. But the accuracy of RoleSim algorithm can be improved. In this study, (1) we propose fast dynamic scheme, D-CoSim and D-deCoSim, for accurate CoSimRank search over large-scale evolving graphs. (2) Based on D-CoSim, we also propose fast scheme, F-CoSim and Opt_F-CoSim, which greatly accelerates CoSimRank search over static graphs. Our theoretical analysis shows that D-CoSim, D-deCoSim F-CoSim and Opt_F-CoSim guarantee the exactness of CoSimRank scores. Experimental evaluations verify the superiority of D-CoSim and D-deCoSim over evolving graphs, and the fast speedupof F-CoSim and Opt_F-CoSim on large-scale static graphs against its competitors, without any loss of accuracy. (3) We propose a novel role similarity search algorithm FaRS, and a speedup algorithm Opt_FaRS, which guarantees the automorphic equivalence capture, and captures the information from the neighbour’s class. The experimental results of FaRS and Opt_FaRS show that our algorithms achieves higher accuracy than baseline algorithms.

Additional Information: © Fan Wang, 2021. Fan Wang asserts her moral right to be identified as the author of this thesis. This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with its author and that no quotation from the thesis and no information derived from it may be published without appropriate permission or acknowledgement. If you have discovered material in Aston Publications Explorer which is unlawful e.g. breaches copyright, (either yours or that of a third party) or any other law, including but not limited to those relating to patent, trademark, confidentiality, data protection, obscenity, defamation, libel, then please read our Takedown Policy and contact the service immediately.
Institution: Aston University
Uncontrolled Keywords: Social Network,SimRank Model,Role-based similarity,Web search
Last Modified: 08 Dec 2023 08:59
Date Deposited: 27 Sep 2022 17:01
Completed Date: 2021-12
Authors: Wang, Fan

Export / Share Citation


Statistics

Additional statistics for this record