P(Expression|Grammar): Probability of deriving an algebraic expression with a probabilistic context-free grammar
This addresses a foundational issue in symbolic regression and generative modeling for researchers in machine learning, though it is incremental as it builds on existing grammar-based methods.
The paper tackles the problem of calculating the probability of deriving a given algebraic expression using a probabilistic context-free grammar, showing it is undecidable in general but providing exact and approximate algorithms for specific grammars like linear, polynomial, and rational expressions.
Probabilistic context-free grammars have a long-term record of use as generative models in machine learning and symbolic regression. When used for symbolic regression, they generate algebraic expressions. We define the latter as equivalence classes of strings derived by grammar and address the problem of calculating the probability of deriving a given expression with a given grammar. We show that the problem is undecidable in general. We then present specific grammars for generating linear, polynomial, and rational expressions, where algorithms for calculating the probability of a given expression exist. For those grammars, we design algorithms for calculating the exact probability and efficient approximation with arbitrary precision.