LGAIAug 27, 2025

Data-Efficient Symbolic Regression via Foundation Model Distillation

arXiv:2508.19487v11 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses the challenge of data-efficient symbolic regression for scientific discovery, though it is incremental as it builds on existing foundation models.

The paper tackles the problem of poor generalization of foundation models in symbolic regression on small datasets by introducing EQUATE, a fine-tuning framework that uses distillation and embedding optimization, achieving state-of-the-art accuracy and robustness on standard benchmarks.

Discovering interpretable mathematical equations from observed data (a.k.a. equation discovery or symbolic regression) is a cornerstone of scientific discovery, enabling transparent modeling of physical, biological, and economic systems. While foundation models pre-trained on large-scale equation datasets offer a promising starting point, they often suffer from negative transfer and poor generalization when applied to small, domain-specific datasets. In this paper, we introduce EQUATE (Equation Generation via QUality-Aligned Transfer Embeddings), a data-efficient fine-tuning framework that adapts foundation models for symbolic equation discovery in low-data regimes via distillation. EQUATE combines symbolic-numeric alignment with evaluator-guided embedding optimization, enabling a principled embedding-search-generation paradigm. Our approach reformulates discrete equation search as a continuous optimization task in a shared embedding space, guided by data-equation fitness and simplicity. Experiments across three standard public benchmarks (Feynman, Strogatz, and black-box datasets) demonstrate that EQUATE consistently outperforms state-of-the-art baselines in both accuracy and robustness, while preserving low complexity and fast inference. These results highlight EQUATE as a practical and generalizable solution for data-efficient symbolic regression in foundation model distillation settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes