LGJun 9, 2022

Model Degradation Hinders Deep Graph Neural Networks

arXiv:2206.04361v156 citationsh-index: 25
Originality Incremental advance
AI Analysis

This addresses a critical bottleneck in graph mining by enabling deeper GNNs for improved expressive power, though it is incremental as it builds on existing GNN frameworks.

The paper tackles the performance degradation problem in deep Graph Neural Networks (GNNs) by identifying model degradation as the major cause, rather than over-smoothing, and introduces the Adaptive Initial Residual (AIR) module, which improves performance on six real-world datasets while allowing deeper architectures with negligible time cost.

Graph Neural Networks (GNNs) have achieved great success in various graph mining tasks.However, drastic performance degradation is always observed when a GNN is stacked with many layers. As a result, most GNNs only have shallow architectures, which limits their expressive power and exploitation of deep neighborhoods.Most recent studies attribute the performance degradation of deep GNNs to the \textit{over-smoothing} issue. In this paper, we disentangle the conventional graph convolution operation into two independent operations: \textit{Propagation} (\textbf{P}) and \textit{Transformation} (\textbf{T}).Following this, the depth of a GNN can be split into the propagation depth ($D_p$) and the transformation depth ($D_t$). Through extensive experiments, we find that the major cause for the performance degradation of deep GNNs is the \textit{model degradation} issue caused by large $D_t$ rather than the \textit{over-smoothing} issue mainly caused by large $D_p$. Further, we present \textit{Adaptive Initial Residual} (AIR), a plug-and-play module compatible with all kinds of GNN architectures, to alleviate the \textit{model degradation} issue and the \textit{over-smoothing} issue simultaneously. Experimental results on six real-world datasets demonstrate that GNNs equipped with AIR outperform most GNNs with shallow architectures owing to the benefits of both large $D_p$ and $D_t$, while the time costs associated with AIR can be ignored.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes