LGAIMar 13, 2023

Transformer-based Planning for Symbolic Regression

arXiv:2303.06833v593 citationsh-index: 46
Originality Incremental advance
AI Analysis

This addresses the problem of generating accurate and simple mathematical expressions from data for researchers and practitioners in machine learning, representing an incremental improvement over existing transformer-based methods.

The paper tackles symbolic regression by proposing TPSR, a transformer-based planning strategy that integrates Monte Carlo Tree Search to incorporate non-differentiable feedback like accuracy and complexity, resulting in improved fitting-complexity trade-off, extrapolation, and noise robustness compared to state-of-the-art methods.

Symbolic regression (SR) is a challenging task in machine learning that involves finding a mathematical expression for a function based on its values. Recent advancements in SR have demonstrated the effectiveness of pre-trained transformer-based models in generating equations as sequences, leveraging large-scale pre-training on synthetic datasets and offering notable advantages in terms of inference time over classical Genetic Programming (GP) methods. However, these models primarily rely on supervised pre-training goals borrowed from text generation and overlook equation discovery objectives like accuracy and complexity. To address this, we propose TPSR, a Transformer-based Planning strategy for Symbolic Regression that incorporates Monte Carlo Tree Search into the transformer decoding process. Unlike conventional decoding strategies, TPSR enables the integration of non-differentiable feedback, such as fitting accuracy and complexity, as external sources of knowledge into the transformer-based equation generation process. Extensive experiments on various datasets show that our approach outperforms state-of-the-art methods, enhancing the model's fitting-complexity trade-off, extrapolation abilities, and robustness to noise.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes