GeoMix: Towards Geometry-Aware Data Augmentation
This addresses the problem of limited labeled data in graph learning for researchers and practitioners, offering a novel approach to enhance GNN performance, though it is incremental as it builds on the Mixup paradigm.
The paper tackles the challenge of applying Mixup to graph learning for node classification by proposing GeoMix, a geometry-aware data augmentation method that uses in-place graph editing to generate synthetic nodes and connections, achieving state-of-the-art results on standard datasets and improving generalization in out-of-distribution tasks.
Mixup has shown considerable success in mitigating the challenges posed by limited labeled data in image classification. By synthesizing samples through the interpolation of features and labels, Mixup effectively addresses the issue of data scarcity. However, it has rarely been explored in graph learning tasks due to the irregularity and connectivity of graph data. Specifically, in node classification tasks, Mixup presents a challenge in creating connections for synthetic data. In this paper, we propose Geometric Mixup (GeoMix), a simple and interpretable Mixup approach leveraging in-place graph editing. It effectively utilizes geometry information to interpolate features and labels with those from the nearby neighborhood, generating synthetic nodes and establishing connections for them. We conduct theoretical analysis to elucidate the rationale behind employing geometry information for node Mixup, emphasizing the significance of locality enhancement-a critical aspect of our method's design. Extensive experiments demonstrate that our lightweight Geometric Mixup achieves state-of-the-art results on a wide variety of standard datasets with limited labeled data. Furthermore, it significantly improves the generalization capability of underlying GNNs across various challenging out-of-distribution generalization tasks. Our code is available at https://github.com/WtaoZhao/geomix.