LGNov 26, 2024

Contrastive Graph Condensation: Advancing Data Versatility through Self-Supervised Learning

arXiv:2411.17063v210 citationsh-index: 13KDD
Originality Highly original
AI Analysis

This work addresses a bottleneck in graph neural network training for scenarios with sparse labels, offering improved generalization across tasks.

The paper tackles the problem of graph condensation methods overfitting to class-specific information and relying heavily on node labels, which limits their utility in label-sparse scenarios and generalization to other downstream tasks. It introduces Contrastive Graph Condensation (CTGC), a self-supervised approach that consistently outperforms state-of-the-art methods in handling various downstream tasks with limited labels.

With the increasing computation of training graph neural networks (GNNs) on large-scale graphs, graph condensation (GC) has emerged as a promising solution to synthesize a compact, substitute graph of the large-scale original graph for efficient GNN training. However, existing GC methods predominantly employ classification as the surrogate task for optimization, thus excessively relying on node labels and constraining their utility in label-sparsity scenarios. More critically, this surrogate task tends to overfit class-specific information within the condensed graph, consequently restricting the generalization capabilities of GC for other downstream tasks. To address these challenges, we introduce Contrastive Graph Condensation (CTGC), which adopts a self-supervised surrogate task to extract critical, causal information from the original graph and enhance the cross-task generalizability of the condensed graph. Specifically, CTGC employs a dual-branch framework to disentangle the generation of the node attributes and graph structures, where a dedicated structural branch is designed to explicitly encode geometric information through nodes' positional embeddings. By implementing an alternating optimization scheme with contrastive loss terms, CTGC promotes the mutual enhancement of both branches and facilitates high-quality graph generation through the model inversion technique. Extensive experiments demonstrate that CTGC excels in handling various downstream tasks with a limited number of labels, consistently outperforming state-of-the-art GC methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes