LGCLMLFeb 24, 2020

FONDUE: A Framework for Node Disambiguation Using Network Embeddings

arXiv:2002.10127v1
AI Analysis

This addresses a common data quality issue in networks like social or citation graphs, which can affect downstream tasks such as information diffusion analysis or bioinformatics, representing an incremental improvement in node disambiguation methods.

The paper tackles the problem of node disambiguation in networks where a single node corresponds to multiple real-life entities, introducing FONDUE, a framework based on network embeddings that substantially and uniformly improves accuracy for identifying ambiguous nodes compared to state-of-the-art methods on twelve benchmark datasets, though it is less optimal for determining how to split them.

Real-world data often presents itself in the form of a network. Examples include social networks, citation networks, biological networks, and knowledge graphs. In their simplest form, networks represent real-life entities (e.g. people, papers, proteins, concepts) as nodes, and describe them in terms of their relations with other entities by means of edges between these nodes. This can be valuable for a range of purposes from the study of information diffusion to bibliographic analysis, bioinformatics research, and question-answering. The quality of networks is often problematic though, affecting downstream tasks. This paper focuses on the common problem where a node in the network in fact corresponds to multiple real-life entities. In particular, we introduce FONDUE, an algorithm based on network embedding for node disambiguation. Given a network, FONDUE identifies nodes that correspond to multiple entities, for subsequent splitting. Extensive experiments on twelve benchmark datasets demonstrate that FONDUE is substantially and uniformly more accurate for ambiguous node identification compared to the existing state-of-the-art, at a comparable computational cost, while less optimal for determining the best way to split ambiguous nodes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes