LGJul 19, 2025

ReDiSC: A Reparameterized Masked Diffusion Model for Scalable Node Classification with Structured Predictions

arXiv:2507.14484v12 citationsh-index: 16
Originality Incremental advance
AI Analysis

This work addresses the challenge of making structured predictions for node labels in graphs, which is important for applications like social network analysis or recommendation systems, and it is incremental by building on diffusion models and GNNs.

The paper tackles the problem of structured node classification in graphs by proposing ReDiSC, a reparameterized masked diffusion model that estimates the joint distribution of node labels, achieving superior or competitive performance compared to state-of-the-art methods across various graph types and scaling effectively to large datasets.

In recent years, graph neural networks (GNN) have achieved unprecedented successes in node classification tasks. Although GNNs inherently encode specific inductive biases (e.g., acting as low-pass or high-pass filters), most existing methods implicitly assume conditional independence among node labels in their optimization objectives. While this assumption is suitable for traditional classification tasks such as image recognition, it contradicts the intuitive observation that node labels in graphs remain correlated, even after conditioning on the graph structure. To make structured predictions for node labels, we propose ReDiSC, namely, Reparameterized masked Diffusion model for Structured node Classification. ReDiSC estimates the joint distribution of node labels using a reparameterized masked diffusion model, which is learned through the variational expectation-maximization (EM) framework. Our theoretical analysis shows the efficiency advantage of ReDiSC in the E-step compared to DPM-SNC, a state-of-the-art model that relies on a manifold-constrained diffusion model in continuous domain. Meanwhile, we explicitly link ReDiSC's M-step objective to popular GNN and label propagation hybrid approaches. Extensive experiments demonstrate that ReDiSC achieves superior or highly competitive performance compared to state-of-the-art GNN, label propagation, and diffusion-based baselines across both homophilic and heterophilic graphs of varying sizes. Notably, ReDiSC scales effectively to large-scale datasets on which previous structured diffusion methods fail due to computational constraints, highlighting its significant practical advantage in structured node classification tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes