Symbolic regression by uniform random global search
This work provides a simple control algorithm for symbolic regression, useful for benchmarking and generating problems, though it is incremental in nature.
The paper tackled symbolic regression by comparing a novel uniform random global search algorithm (SRURGS) to genetic programming (SRGP) on 100 randomly generated equations, finding that SRGP is faster for simple problems but SRURGS is more robust in challenging cases.
Symbolic regression (SR) is a data analysis problem where we search for the mathematical expression that best fits a numerical dataset. It is a global optimization problem. The most popular approach to SR is by genetic programming (SRGP). It is a common paradigm to compare an algorithm's performance to that of random search, but the data comparing SRGP to random search is lacking. We describe a novel algorithm for SR, namely SR by uniform random global search (SRURGS), also known as pure random search. We conduct experiments comparing SRURGS with SRGP using 100 randomly generated equations. Our results suggest that a SRGP is faster than SRURGS in producing equations with good R^2 for simple problems. However, our experiments suggest that SRURGS is more robust than SRGP, able to produce good output in more challenging problems. As SRURGS is arguably the simplest global search algorithm, we believe it should serve as a control algorithm against which other symbolic regression algorithms are compared. SRURGS has only one tuning parameter, and is conceptually very simple, making it a useful tool in solving SR problems. The method produces random equations, which is useful for the generation of symbolic regression benchmark problems. We have released well documented and open-source python code, currently under formal peer-review, so that interested researchers can deploy the tool in practice.