SILGSOC-PHMLNov 28, 2018

Link Prediction in Networks with Core-Fringe Data

arXiv:1811.11540v216 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of optimizing network data collection for link prediction, which is incremental as it builds on existing methods by analyzing core-fringe structures.

The study investigates how including fringe nodes affects link prediction in networks with core-fringe data, finding that additional fringe data can sometimes hurt, help, or saturate performance depending on the dataset, as demonstrated using random graph models.

Data collection often involves the partial measurement of a larger system. A common example arises in collecting network data: we often obtain network datasets by recording all of the interactions among a small set of core nodes, so that we end up with a measurement of the network consisting of these core nodes along with a potentially much larger set of fringe nodes that have links to the core. Given the ubiquity of this process for assembling network data, it is crucial to understand the role of such a `core-fringe' structure. Here we study how the inclusion of fringe nodes affects the standard task of network link prediction. One might initially think the inclusion of any additional data is useful, and hence that it should be beneficial to include all fringe nodes that are available. However, we find that this is not true; in fact, there is substantial variability in the value of the fringe nodes for prediction. Once an algorithm is selected, in some datasets, including any additional data from the fringe can actually hurt prediction performance; in other datasets, including some amount of fringe information is useful before prediction performance saturates or even declines; and in further cases, including the entire fringe leads to the best performance. While such variety might seem surprising, we show that these behaviors are exhibited by simple random graph models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes