LGJun 7, 2023

Scalable Neural Symbolic Regression using Control Variables

arXiv:2306.04718v22 citationsh-index: 24
Originality Incremental advance
AI Analysis

This addresses scalability challenges for researchers and practitioners in natural sciences who need interpretable mathematical expressions from data, though it appears incremental as it builds on existing symbolic regression methods.

The authors tackled the scalability issue in symbolic regression for complex equations with multiple variables by proposing ScaleSR, which decomposes multi-variable problems into single-variable ones using control variables, and it significantly outperformed state-of-the-art baselines on benchmark datasets.

Symbolic regression (SR) is a powerful technique for discovering the analytical mathematical expression from data, finding various applications in natural sciences due to its good interpretability of results. However, existing methods face scalability issues when dealing with complex equations involving multiple variables. To address this challenge, we propose ScaleSR, a scalable symbolic regression model that leverages control variables to enhance both accuracy and scalability. The core idea is to decompose multi-variable symbolic regression into a set of single-variable SR problems, which are then combined in a bottom-up manner. The proposed method involves a four-step process. First, we learn a data generator from observed data using deep neural networks (DNNs). Second, the data generator is used to generate samples for a certain variable by controlling the input variables. Thirdly, single-variable symbolic regression is applied to estimate the corresponding mathematical expression. Lastly, we repeat steps 2 and 3 by gradually adding variables one by one until completion. We evaluate the performance of our method on multiple benchmark datasets. Experimental results demonstrate that the proposed ScaleSR significantly outperforms state-of-the-art baselines in discovering mathematical expressions with multiple variables. Moreover, it can substantially reduce the search space for symbolic regression. The source code will be made publicly available upon publication.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes