Augmentation-Free Graph Contrastive Learning with Performance Guarantee
This work addresses a bottleneck in applying GCL to heterophilic graphs, offering a more efficient and effective solution for graph-structured data analysis.
The authors tackled the reliance on graph augmentations in graph contrastive learning (GCL) by proposing an augmentation-free method, AF-GCL, which uses aggregated features for self-supervision and is less sensitive to graph homophily; it achieved competitive or better performance on homophilic graphs and outperformed state-of-the-art methods on heterophilic graphs across 14 datasets with significantly less computational overhead.
Graph contrastive learning (GCL) is the most representative and prevalent self-supervised learning approach for graph-structured data. Despite its remarkable success, existing GCL methods highly rely on an augmentation scheme to learn the representations invariant across different augmentation views. In this work, we revisit such a convention in GCL through examining the effect of augmentation techniques on graph data via the lens of spectral theory. We found that graph augmentations preserve the low-frequency components and perturb the middle-and high-frequency components of the graph, which contributes to the success of GCL algorithms on homophilic graphs but hinder its application on heterophilic graphs, due to the high-frequency preference of heterophilic data. Motivated by this, we propose a novel, theoretically-principled, and augmentation-free GCL method, named AF-GCL, that (1) leverages the features aggregated by Graph Neural Network to construct the self-supervision signal instead of augmentations and therefore (2) is less sensitive to the graph homophily degree. Theoretically, We present the performance guarantee for AF-GCL as well as an analysis for understanding the efficacy of AF-GCL. Extensive experiments on 14 benchmark datasets with varying degrees of heterophily show that AF-GCL presents competitive or better performance on homophilic graphs and outperforms all existing state-of-the-art GCL methods on heterophilic graphs with significantly less computational overhead.