LGCOIMApr 13, 2023

Priors for symbolic regression

arXiv:2304.06333v212 citationsh-index: 20
Originality Incremental advance
AI Analysis

This work addresses the challenge of model selection in symbolic regression for researchers in fields like cosmology, offering a more principled way to incorporate prior information, though it is incremental as it builds on existing Bayesian and language model techniques.

The authors tackled the problem of incorporating prior knowledge into symbolic regression to prefer simpler or more familiar expressions, developing a method that uses an n-gram language model for function structure and a Bayesian approach for parameter priors. They demonstrated improved performance on benchmarks and a cosmology dataset, showing competitive results compared to standard methods.

When choosing between competing symbolic models for a data set, a human will naturally prefer the "simpler" expression or the one which more closely resembles equations previously seen in a similar context. This suggests a non-uniform prior on functions, which is, however, rarely considered within a symbolic regression (SR) framework. In this paper we develop methods to incorporate detailed prior information on both functions and their parameters into SR. Our prior on the structure of a function is based on a $n$-gram language model, which is sensitive to the arrangement of operators relative to one another in addition to the frequency of occurrence of each operator. We also develop a formalism based on the Fractional Bayes Factor to treat numerical parameter priors in such a way that models may be fairly compared though the Bayesian evidence, and explicitly compare Bayesian, Minimum Description Length and heuristic methods for model selection. We demonstrate the performance of our priors relative to literature standards on benchmarks and a real-world dataset from the field of cosmology.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes