LG AI IR MLApr 17, 2018

Scalable attribute-aware network embedding with locality

Weiyi Liu, Zhining Liu, Toyotaro Suzumura, Guangmin Hu

arXiv:1804.07152v20.8

Originality Incremental advance

AI Analysis

This addresses scalability issues in network embedding for domains requiring joint analysis of topology and attributes, though it is incremental as it builds on existing joint embedding methods.

The paper tackles the problem of scalable joint network embedding from topology and attributes by proposing SANE, which uses locality to align local linear relationships between nodes and their K-nearest neighbors, achieving up to a 71.4% performance gain over topology-based methods and linear time complexity with learning taking about 10 seconds for 100,000 nodes.

Adding attributes for nodes to network embedding helps to improve the ability of the learned joint representation to depict features from topology and attributes simultaneously. Recent research on the joint embedding has exhibited a promising performance on a variety of tasks by jointly embedding the two spaces. However, due to the indispensable requirement of globality based information, present approaches contain a flaw of in-scalability. Here we propose \emph{SANE}, a scalable attribute-aware network embedding algorithm with locality, to learn the joint representation from topology and attributes. By enforcing the alignment of a local linear relationship between each node and its K-nearest neighbors in topology and attribute space, the joint embedding representations are more informative comparing with a single representation from topology or attributes alone. And we argue that the locality in \emph{SANE} is the key to learning the joint representation at scale. By using several real-world networks from diverse domains, We demonstrate the efficacy of \emph{SANE} in performance and scalability aspect. Overall, for performance on label classification, SANE successfully reaches up to the highest F1-score on most datasets, and even closer to the baseline method that needs label information as extra inputs, compared with other state-of-the-art joint representation algorithms. What's more, \emph{SANE} has an up to 71.4\% performance gain compared with the single topology-based algorithm. For scalability, we have demonstrated the linearly time complexity of \emph{SANE}. In addition, we intuitively observe that when the network size scales to 100,000 nodes, the "learning joint embedding" step of \emph{SANE} only takes $\approx10$ seconds.

View on arXiv PDF

Similar