LGJun 21, 2025

Towards a deeper GCN: Alleviate over-smoothing with iterative training and fine-tuning

Furong Peng, Jinzhen Gao, Xuan Lu, Kang Liu, Yifan Huo, Sheng Wang

arXiv:2506.17576v24.1h-index: 3Has CodeECML/PKDD

Originality Incremental advance

AI Analysis

This work addresses a critical bottleneck for researchers and practitioners using deep GCNs in graph-based tasks, offering an incremental improvement through a novel training strategy.

The paper tackles the problem of over-smoothing in deep Graph Convolutional Networks (GCNs), which causes performance degradation, and proposes Layer-wise Gradual Training (LGT) to address it, achieving state-of-the-art accuracy in 32-layer settings on benchmark datasets.

Graph Convolutional Networks (GCNs) suffer from severe performance degradation in deep architectures due to over-smoothing. While existing studies primarily attribute the over-smoothing to repeated applications of graph Laplacian operators, our empirical analysis reveals a critical yet overlooked factor: trainable linear transformations in GCNs significantly exacerbate feature collapse, even at moderate depths (e.g., 8 layers). In contrast, Simplified Graph Convolution (SGC), which removes these transformations, maintains stable feature diversity up to 32 layers, highlighting linear transformations' dual role in facilitating expressive power and inducing over-smoothing. However, completely removing linear transformations weakens the model's expressive capacity. To address this trade-off, we propose Layer-wise Gradual Training (LGT), a novel training strategy that progressively builds deep GCNs while preserving their expressiveness. LGT integrates three complementary components: (1) layer-wise training to stabilize optimization from shallow to deep layers, (2) low-rank adaptation to fine-tune shallow layers and accelerate training, and (3) identity initialization to ensure smooth integration of new layers and accelerate convergence. Extensive experiments on benchmark datasets demonstrate that LGT achieves state-of-the-art performance on vanilla GCN, significantly improving accuracy even in 32-layer settings. Moreover, as a training method, LGT can be seamlessly combined with existing methods such as PairNorm and ContraNorm, further enhancing their performance in deeper networks. LGT offers a general, architecture-agnostic training framework for scalable deep GCNs. The code is available at [https://github.com/jfklasdfj/LGT_GCN].

View on arXiv PDF Code

Similar