Bayesian Symbolic Regression via Posterior Sampling
This incremental advance addresses noise sensitivity in symbolic regression for scientific discovery and engineering design applications.
The paper tackled the problem of symbolic regression's sensitivity to noise by introducing a Bayesian Sequential Monte Carlo framework, resulting in improved robustness and generalization with reduced overfitting compared to genetic programming baselines.
Symbolic regression is a powerful tool for discovering governing equations directly from data, but its sensitivity to noise hinders its broader application. This paper introduces a Sequential Monte Carlo (SMC) framework for Bayesian symbolic regression that approximates the posterior distribution over symbolic expressions, enhancing robustness and enabling uncertainty quantification for symbolic regression in the presence of noise. Differing from traditional genetic programming approaches, the SMC-based algorithm combines probabilistic selection, adaptive tempering, and the use of normalized marginal likelihood to efficiently explore the search space of symbolic expressions, yielding parsimonious expressions with improved generalization. When compared to standard genetic programming baselines, the proposed method better deals with challenging, noisy benchmark datasets. The reduced tendency to overfit and enhanced ability to discover accurate and interpretable equations paves the way for more robust symbolic regression in scientific discovery and engineering design applications.