LGSep 16, 2025

Discovering Mathematical Equations with Diffusion Language Model

arXiv:2509.13136v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses the problem of discovering mathematical equations from data for scientific discovery, representing an incremental improvement over existing methods.

The paper tackles symbolic regression by introducing DiffuSR, a pre-training framework using a continuous-state diffusion language model to discover mathematical equations from data, achieving competitive performance with state-of-the-art methods and generating more interpretable and diverse expressions.

Discovering valid and meaningful mathematical equations from observed data plays a crucial role in scientific discovery. While this task, symbolic regression, remains challenging due to the vast search space and the trade-off between accuracy and complexity. In this paper, we introduce DiffuSR, a pre-training framework for symbolic regression built upon a continuous-state diffusion language model. DiffuSR employs a trainable embedding layer within the diffusion process to map discrete mathematical symbols into a continuous latent space, modeling equation distributions effectively. Through iterative denoising, DiffuSR converts an initial noisy sequence into a symbolic equation, guided by numerical data injected via a cross-attention mechanism. We also design an effective inference strategy to enhance the accuracy of the diffusion-based equation generator, which injects logit priors into genetic programming. Experimental results on standard symbolic regression benchmarks demonstrate that DiffuSR achieves competitive performance with state-of-the-art autoregressive methods and generates more interpretable and diverse mathematical expressions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes