Meta-learning optimizes predictions of missing links in real-world networks
This work addresses the challenge of incomplete network data in fields like social network analysis and biology, offering a practical solution for optimizing link prediction, though it is incremental as it builds on existing stacking and neural network techniques.
The study tackled the problem of predicting missing links in real-world networks without node attributes by systematically comparing model stacking and graph neural network algorithms across 550 diverse networks. It found that no single algorithm performs best universally, but a meta-learning algorithm that selects the best algorithm based on network characteristics outperforms all state-of-the-art methods, achieving high scalability and improved accuracy.
Relational data are ubiquitous in real-world data applications, e.g., in social network analysis or biological modeling, but networks are nearly always incompletely observed. The state-of-the-art for predicting missing links in the hard case of a network without node attributes uses model stacking or neural network techniques. It remains unknown which approach is best, and whether or how the best choice of algorithm depends on the input network's characteristics. We answer these questions systematically using a large, structurally diverse benchmark of 550 real-world networks under two standard accuracy measures (AUC and Top-k), comparing four stacking algorithms with 42 topological link predictors, two of which we introduce here, and two graph neural network algorithms. We show that no algorithm is best across all input networks, all algorithms perform well on most social networks, and few perform well on economic and biological networks. Overall, model stacking with a random forest is both highly scalable and surpasses on AUC or is competitive with graph neural networks on Top-k accuracy. But, algorithm performance depends strongly on network characteristics like the degree distribution, triangle density, and degree assortativity. We introduce a meta-learning algorithm that exploits this variability to optimize link predictions for individual networks by selecting the best algorithm to apply, which we show outperforms all state-of-the-art algorithms and scales to large networks.