Neural Estimation of Pairwise Mutual Information in Masked Discrete Sequence Models
For practitioners using masked diffusion models, this method improves inference efficiency by exploiting learned dependency structure without sacrificing output quality.
The authors propose a neural estimator to compute pairwise conditional mutual information from the hidden states of masked diffusion models, enabling MI-guided parallel decoding. On Sudoku and protein generation tasks, this reduces inference forward passes by 3-5x while maintaining generative quality and outperforming entropy-based methods.
Understanding dependencies between variables is critical for interpretability and efficient generation in masked diffusion models (MDMs), yet these models primarily expose marginal conditional distributions and do not explicitly represent inter-variable dependence. We propose a neural framework for estimating pairwise conditional mutual information (MI) directly from the hidden states of a pretrained MDM, using ground-truth MI computed from the model's own conditional distributions for supervision. The resulting estimator captures the model's internal belief about dependency structure and predicts the full MI matrix in a single forward pass, enabling MI-guided parallel decoding by identifying conditionally independent subsets of variables. We evaluate our approach on Sudoku and protein sequence generation with ESM-C, where the MI maps recover known structural constraints and enable a 3-5x magnitude reduction in inference-time forward passes compared to sequential decoding, while preserving generative quality and outperforming entropy-based parallelization methods.