Discovering physical laws with parallel symbolic enumeration
This addresses the bottleneck of accuracy and efficiency in symbolic regression, enabling faster scientific exploration across interdisciplinary domains, though it appears incremental as it builds on existing methods to improve performance.
The paper tackled the challenge of efficiently discovering parsimonious and generalizable mathematical expressions from data in symbolic regression, introducing parallel symbolic enumeration (PSE) which achieved up to 99% higher recovery accuracy and an order of magnitude faster runtime compared to state-of-the-art baselines across over 200 problem sets.
Symbolic regression plays a crucial role in modern scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data. A key challenge lies in the search for parsimonious and generalizable mathematical formulas, in an infinite search space, while intending to fit the training data. Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity, which essentially hinders the pace of applying symbolic regression for scientific exploration across interdisciplinary domains. To this end, we introduce parallel symbolic enumeration (PSE) to efficiently distill generic mathematical expressions from limited data. Experiments show that PSE achieves higher accuracy and faster computation compared to the state-of-the-art baseline algorithms across over 200 synthetic and experimental problem sets (e.g., improving the recovery accuracy by up to 99% and reducing runtime by an order of magnitude). PSE represents an advance in accurate and efficient data-driven discovery of symbolic, interpretable models (e.g., underlying physical laws), and improves the scalability of symbolic learning.