LGFeb 7, 2024

Navigating Complexity: Toward Lossless Graph Condensation via Expanding Window Matching

Peking U
arXiv:2402.05011v341 citationsh-index: 13Has CodeICML
Originality Incremental advance
AI Analysis

This work addresses the computational cost of training GNNs on large-scale graph datasets, offering a domain-specific improvement for graph machine learning applications.

The paper tackles the problem of graph condensation for Graph Neural Networks (GNNs) by addressing limitations in existing methods that fail to achieve lossless condensation, resulting in a new method that achieves up to 99.9% performance retention on benchmark datasets.

Graph condensation aims to reduce the size of a large-scale graph dataset by synthesizing a compact counterpart without sacrificing the performance of Graph Neural Networks (GNNs) trained on it, which has shed light on reducing the computational cost for training GNNs. Nevertheless, existing methods often fall short of accurately replicating the original graph for certain datasets, thereby failing to achieve the objective of lossless condensation. To understand this phenomenon, we investigate the potential reasons and reveal that the previous state-of-the-art trajectory matching method provides biased and restricted supervision signals from the original graph when optimizing the condensed one. This significantly limits both the scale and efficacy of the condensed graph. In this paper, we make the first attempt toward \textit{lossless graph condensation} by bridging the previously neglected supervision signals. Specifically, we employ a curriculum learning strategy to train expert trajectories with more diverse supervision signals from the original graph, and then effectively transfer the information into the condensed graph with expanding window matching. Moreover, we design a loss function to further extract knowledge from the expert trajectories. Theoretical analysis justifies the design of our method and extensive experiments verify its superiority across different datasets. Code is released at https://github.com/NUS-HPC-AI-Lab/GEOM.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes