LGIMAPFeb 6, 2024

Multi-View Symbolic Regression

arXiv:2402.04298v48 citationsh-index: 24GECCO
Originality Incremental advance
AI Analysis

This work addresses the limitation of traditional SR methods for researchers dealing with multiple experimental datasets, enabling broader application in fields like astronomy, chemistry, and economy.

The authors tackled the problem of symbolic regression (SR) when multiple datasets from different experimental setups are available, by introducing Multi-View Symbolic Regression (MvSR) that outputs a general parametric solution fitting all datasets simultaneously. Results show MvSR obtains the correct expression more frequently and is robust to hyperparameter changes, recovering known expressions and promising alternatives in real-world data from astronomy, chemistry, and economy.

Symbolic regression (SR) searches for analytical expressions representing the relationship between a set of explanatory and response variables. Current SR methods assume a single dataset extracted from a single experiment. Nevertheless, frequently, the researcher is confronted with multiple sets of results obtained from experiments conducted with different setups. Traditional SR methods may fail to find the underlying expression since the parameters of each experiment can be different. In this work we present Multi-View Symbolic Regression (MvSR), which takes into account multiple datasets simultaneously, mimicking experimental environments, and outputs a general parametric solution. This approach fits the evaluated expression to each independent dataset and returns a parametric family of functions f(x; theta) simultaneously capable of accurately fitting all datasets. We demonstrate the effectiveness of MvSR using data generated from known expressions, as well as real-world data from astronomy, chemistry and economy, for which an a priori analytical expression is not available. Results show that MvSR obtains the correct expression more frequently and is robust to hyperparameters change. In real-world data, it is able to grasp the group behavior, recovering known expressions from the literature as well as promising alternatives, thus enabling the use of SR to a large range of experimental scenarios.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes