Graph Structure Refinement with Energy-based Contrastive Learning
This work addresses robustness issues in graph-structured data analysis for machine learning practitioners, though it appears incremental as it combines existing techniques like energy-based models and contrastive learning.
The paper tackles the problem of imperfect graph structures with noisy links that limit Graph Neural Networks' performance by proposing an unsupervised Energy-based Contrastive Learning guided Graph Structure Refinement (ECL-GSR) framework, which outperforms state-of-the-art methods on eight benchmark datasets in node classification with faster training and fewer resources.
Graph Neural Networks (GNNs) have recently gained widespread attention as a successful tool for analyzing graph-structured data. However, imperfect graph structure with noisy links lacks enough robustness and may damage graph representations, therefore limiting the GNNs' performance in practical tasks. Moreover, existing generative architectures fail to fit discriminative graph-related tasks. To tackle these issues, we introduce an unsupervised method based on a joint of generative training and discriminative training to learn graph structure and representation, aiming to improve the discriminative performance of generative models. We propose an Energy-based Contrastive Learning (ECL) guided Graph Structure Refinement (GSR) framework, denoted as ECL-GSR. To our knowledge, this is the first work to combine energy-based models with contrastive learning for GSR. Specifically, we leverage ECL to approximate the joint distribution of sample pairs, which increases the similarity between representations of positive pairs while reducing the similarity between negative ones. Refined structure is produced by augmenting and removing edges according to the similarity metrics among node representations. Extensive experiments demonstrate that ECL-GSR outperforms the state-of-the-art on eight benchmark datasets in node classification. ECL-GSR achieves faster training with fewer samples and memories against the leading baseline, highlighting its simplicity and efficiency in downstream tasks.