What's Behind the Mask: Understanding Masked Graph Modeling for Graph Autoencoders
This work addresses a theoretical gap in self-supervised learning for graph-structured data, offering insights that could benefit researchers and practitioners in graph machine learning, though it appears incremental as it builds on existing masked autoencoding ideas.
The paper tackles the lack of theoretical understanding of masked autoencoding for graph autoencoders (GAEs) by proposing MaskGAE, a self-supervised framework that uses masked graph modeling to reconstruct masked edges. It shows that this approach improves GAEs, with empirical results demonstrating superiority over state-of-the-art methods on link prediction and node classification tasks across various graph benchmarks.
The last years have witnessed the emergence of a promising self-supervised learning strategy, referred to as masked autoencoding. However, there is a lack of theoretical understanding of how masking matters on graph autoencoders (GAEs). In this work, we present masked graph autoencoder (MaskGAE), a self-supervised learning framework for graph-structured data. Different from standard GAEs, MaskGAE adopts masked graph modeling (MGM) as a principled pretext task - masking a portion of edges and attempting to reconstruct the missing part with partially visible, unmasked graph structure. To understand whether MGM can help GAEs learn better representations, we provide both theoretical and empirical evidence to comprehensively justify the benefits of this pretext task. Theoretically, we establish close connections between GAEs and contrastive learning, showing that MGM significantly improves the self-supervised learning scheme of GAEs. Empirically, we conduct extensive experiments on a variety of graph benchmarks, demonstrating the superiority of MaskGAE over several state-of-the-arts on both link prediction and node classification tasks.