ML LGMay 24, 2022

Not too little, not too much: a theoretical analysis of graph (over)smoothing

arXiv:2205.12156v233.8170 citationsh-index: 12Has Code

Originality Incremental advance

AI Analysis

It addresses the oversmoothing problem in GNNs for researchers, providing theoretical insights into optimal smoothing levels, but is incremental as it builds on existing analyses.

The paper analyzes graph smoothing with mean aggregation in linear GNNs, showing that a finite number of steps improves learning performance by restoring lost information before oversmoothing occurs, with specific benefits for regression and classification tasks.

We analyze graph smoothing with \emph{mean aggregation}, where each node successively receives the average of the features of its neighbors. Indeed, it has quickly been observed that Graph Neural Networks (GNNs), which generally follow some variant of Message-Passing (MP) with repeated aggregation, may be subject to the oversmoothing phenomenon: by performing too many rounds of MP, the node features tend to converge to a non-informative limit. In the case of mean aggregation, for connected graphs, the node features become constant across the whole graph. At the other end of the spectrum, it is intuitively obvious that some MP rounds are necessary, but existing analyses do not exhibit both phenomena at once: beneficial ``finite'' smoothing and oversmoothing in the limit. In this paper, we consider simplified linear GNNs, and rigorously analyze two examples for which a finite number of mean aggregation steps provably improves the learning performance, before oversmoothing kicks in. We consider a latent space random graph model, where node features are partial observations of the latent variables and the graph contains pairwise relationships between them. We show that graph smoothing restores some of the lost information, up to a certain point, by two phenomenon: graph smoothing shrinks non-principal directions in the data faster than principal ones, which is useful for regression, and shrinks nodes within communities faster than they collapse together, which improves classification.

View on arXiv PDF Code

Similar