LGJun 24, 2025

Dimension Reduction for Symbolic Regression

arXiv:2506.19537v11 citationsh-index: 3AAAI
Originality Incremental advance
AI Analysis

This addresses the problem of recovering complex symbolic formulas from finite samples for researchers in machine learning and symbolic computation, though it is incremental as it builds on existing methods.

The paper tackles the challenge of symbolic regression by proposing a dimension reduction method that identifies and substitutes variable combinations, which significantly improves the performance of state-of-the-art algorithms.

Solutions of symbolic regression problems are expressions that are composed of input variables and operators from a finite set of function symbols. One measure for evaluating symbolic regression algorithms is their ability to recover formulae, up to symbolic equivalence, from finite samples. Not unexpectedly, the recovery problem becomes harder when the formula gets more complex, that is, when the number of variables and operators gets larger. Variables in naturally occurring symbolic formulas often appear only in fixed combinations. This can be exploited in symbolic regression by substituting one new variable for the combination, effectively reducing the number of variables. However, finding valid substitutions is challenging. Here, we address this challenge by searching over the expression space of small substitutions and testing for validity. The validity test is reduced to a test of functional dependence. The resulting iterative dimension reduction procedure can be used with any symbolic regression approach. We show that it reliably identifies valid substitutions and significantly boosts the performance of different types of state-of-the-art symbolic regression algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes