Active Learning Improves Performance on Symbolic RegressionTasks in StackGP
This work addresses the problem of data efficiency in symbolic regression for researchers and practitioners, though it is incremental as it builds on existing StackGP methods.
The paper tackles symbolic regression by introducing an active learning method for StackGP that selects data points to maximize prediction uncertainty, aiming to find appropriate models with fewer data points. It successfully rediscovers 72 out of 100 Feynman equations using minimal data, without domain expertise or data translation.
In this paper we introduce an active learning method for symbolic regression using StackGP. The approach begins with a small number of data points for StackGP to model. To improve the model the system incrementally adds a data point such that the new point maximizes prediction uncertainty as measured by the model ensemble. Symbolic regression is re-run with the larger data set. This cycle continues until the system satisfies a termination criterion. We use the Feynman AI benchmark set of equations to examine the ability of our method to find appropriate models using fewer data points. The approach was found to successfully rediscover 72 of the 100 Feynman equations using as few data points as possible, and without use of domain expertise or data translation.