LGAISIFeb 14, 2024

SimMLP: Training MLPs on Graphs without Supervision

arXiv:2402.08918v319 citationsh-index: 24Has CodeWSDM
Originality Highly original
AI Analysis

This addresses the problem of slow inference in graph learning for applications like real-time fraud detection, offering a novel method that integrates structural information into MLPs, though it builds on prior distillation approaches.

The paper tackles the challenge of deploying Graph Neural Networks (GNNs) in latency-sensitive applications by proposing SimMLP, a self-supervised framework that trains Multi-Layer Perceptrons (MLPs) on graphs to achieve inference acceleration, with experiments on 20 benchmark datasets showing superiority over state-of-the-art baselines, especially for unseen nodes.

Graph Neural Networks (GNNs) have demonstrated their effectiveness in various graph learning tasks, yet their reliance on neighborhood aggregation during inference poses challenges for deployment in latency-sensitive applications, such as real-time financial fraud detection. To address this limitation, recent studies have proposed distilling knowledge from teacher GNNs into student Multi-Layer Perceptrons (MLPs) trained on node content, aiming to accelerate inference. However, these approaches often inadequately explore structural information when inferring unseen nodes. To this end, we introduce SimMLP, a Self-supervised framework for learning MLPs on graphs, designed to fully integrate rich structural information into MLPs. Notably, SimMLP is the first MLP-learning method that can achieve equivalence to GNNs in the optimal case. The key idea is to employ self-supervised learning to align the representations encoded by graph context-aware GNNs and neighborhood dependency-free MLPs, thereby fully integrating the structural information into MLPs. We provide a comprehensive theoretical analysis, demonstrating the equivalence between SimMLP and GNNs based on mutual information and inductive bias, highlighting SimMLP's advanced structural learning capabilities. Additionally, we conduct extensive experiments on 20 benchmark datasets, covering node classification, link prediction, and graph classification, to showcase SimMLP's superiority over state-of-the-art baselines, particularly in scenarios involving unseen nodes (e.g., inductive and cold-start node classification) where structural insights are crucial. Our codes are available at: https://github.com/Zehong-Wang/SimMLP.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes