Node Duplication Improves Cold-start Link Prediction
It addresses the cold-start problem in recommendation systems by enhancing GNN performance for users with few interactions, though it is incremental as it builds on existing GNN methods.
The paper tackles the problem of Graph Neural Networks (GNNs) struggling with low-degree nodes in link prediction, particularly for cold-start scenarios like recommendation systems, and proposes NodeDup, a node duplication augmentation technique that improves performance on low-degree nodes without harming high-degree nodes, achieving average improvements of 38.49% on isolated, 13.34% on low-degree, and 6.76% on warm nodes across datasets.
Graph Neural Networks (GNNs) are prominent in graph machine learning and have shown state-of-the-art performance in Link Prediction (LP) tasks. Nonetheless, recent studies show that GNNs struggle to produce good results on low-degree nodes despite their overall strong performance. In practical applications of LP, like recommendation systems, improving performance on low-degree nodes is critical, as it amounts to tackling the cold-start problem of improving the experiences of users with few observed interactions. In this paper, we investigate improving GNNs' LP performance on low-degree nodes while preserving their performance on high-degree nodes and propose a simple yet surprisingly effective augmentation technique called NodeDup. Specifically, NodeDup duplicates low-degree nodes and creates links between nodes and their own duplicates before following the standard supervised LP training scheme. By leveraging a ''multi-view'' perspective for low-degree nodes, NodeDup shows significant LP performance improvements on low-degree nodes without compromising any performance on high-degree nodes. Additionally, as a plug-and-play augmentation module, NodeDup can be easily applied to existing GNNs with very light computational cost. Extensive experiments show that NodeDup achieves 38.49%, 13.34%, and 6.76% improvements on isolated, low-degree, and warm nodes, respectively, on average across all datasets compared to GNNs and state-of-the-art cold-start methods.