A divide and conquer method for symbolic regression
This work addresses the scalability issue in symbolic regression for researchers and practitioners dealing with high-dimensional data, though it is an incremental improvement over existing GP methods.
The paper tackles the slow convergence of genetic programming (GP) in symbolic regression for large-scale problems by proposing a divide and conquer (D&C) method that exploits separability in target functions, using a Bi-Correlation test (BiCT) to probe separability, and shows that D&C helps GP find target functions much more rapidly in real-world applications.
Symbolic regression aims to find a function that best explains the relationship between independent variables and the objective value based on a given set of sample data. Genetic programming (GP) is usually considered as an appropriate method for the problem since it can optimize functional structure and coefficients simultaneously. However, the convergence speed of GP might be too slow for large scale problems that involve a large number of variables. Fortunately, in many applications, the target function is separable or partially separable. This feature motivated us to develop a new method, divide and conquer (D&C), for symbolic regression, in which the target function is divided into a number of sub-functions and the sub-functions are then determined by any of a GP algorithm. The separability is probed by a new proposed technique, Bi-Correlation test (BiCT). D&C powered GP has been tested on some real-world applications, and the study shows that D&C can help GP to get the target function much more rapidly.