LGAIFLJun 14, 2023

Probabilistic Regular Tree Priors for Scientific Symbolic Reasoning

arXiv:2306.08506v22 citationsh-index: 41
Originality Incremental advance
AI Analysis

This work addresses the problem of incorporating expert knowledge into symbolic regression for scientific discovery, representing an incremental improvement over existing grammar-based methods.

The paper tackles the mismatch between context-free grammars and tree structures in symbolic regression by introducing probabilistic Regular Tree Expressions (pRTE) to compactly encode expert priors and adapting Bayesian inference for efficient use. In scientific case studies, it demonstrates effectiveness in soil science and hyper-elastic material modeling, though no concrete performance numbers are provided.

Symbolic Regression (SR) allows for the discovery of scientific equations from data. To limit the large search space of possible equations, prior knowledge has been expressed in terms of formal grammars that characterize subsets of arbitrary strings. However, there is a mismatch between context-free grammars required to express the set of syntactically correct equations, missing closure properties of the former, and a tree structure of the latter. Our contributions are to (i) compactly express experts' prior beliefs about which equations are more likely to be expected by probabilistic Regular Tree Expressions (pRTE), and (ii) adapt Bayesian inference to make such priors efficiently available for symbolic regression encoded as finite state machines. Our scientific case studies show its effectiveness in soil science to find sorption isotherms and for modeling hyper-elastic materials.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes