Bayesian symbolic regression: Automated equation discovery from a physicists' perspective
This work addresses the need for more rigorous and theoretically grounded methods in symbolic regression, offering a solution for physicists and researchers seeking automated equation discovery with improved reliability over heuristic approaches.
The paper tackles the problem of learning closed-form mathematical models from data by introducing a probabilistic approach to symbolic regression, which establishes model plausibility from basic considerations and explicit approximations and provides performance guarantees that heuristic methods lack.
Symbolic regression automates the process of learning closed-form mathematical models from data. Standard approaches to symbolic regression, as well as newer deep learning approaches, rely on heuristic model selection criteria, heuristic regularization, and heuristic exploration of model space. Here, we discuss the probabilistic approach to symbolic regression, an alternative to such heuristic approaches with direct connections to information theory and statistical physics. We show how the probabilistic approach establishes model plausibility from basic considerations and explicit approximations, and how it provides guarantees of performance that heuristic approaches lack. We also discuss how the probabilistic approach compels us to consider model ensembles, as opposed to single models.