LGNov 6, 2025

Decomposable Neuro Symbolic Regression

arXiv:2511.04124v14.11 citationsh-index: 2

Originality Incremental advance

AI Analysis

This addresses the issue of interpretability and accuracy in symbolic regression for researchers and practitioners in fields like scientific modeling, but it is incremental as it builds on existing techniques like transformers and genetic algorithms.

The paper tackled the problem of symbolic regression methods producing overly complex or inaccurate expressions by introducing a decomposable method that generates interpretable multivariate expressions using transformers, genetic algorithms, and genetic programming. The result was lower or comparable interpolation and extrapolation errors compared to existing methods, with the approach consistently learning expressions that matched the original mathematical structure.

Symbolic regression (SR) models complex systems by discovering mathematical expressions that capture underlying relationships in observed data. However, most SR methods prioritize minimizing prediction error over identifying the governing equations, often producing overly complex or inaccurate expressions. To address this, we present a decomposable SR method that generates interpretable multivariate expressions leveraging transformer models, genetic algorithms (GAs), and genetic programming (GP). In particular, our explainable SR method distills a trained ``opaque'' regression model into mathematical expressions that serve as explanations of its computed function. Our method employs a Multi-Set Transformer to generate multiple univariate symbolic skeletons that characterize how each variable influences the opaque model's response. We then evaluate the generated skeletons' performance using a GA-based approach to select a subset of high-quality candidates before incrementally merging them via a GP-based cascade procedure that preserves their original skeleton structure. The final multivariate skeletons undergo coefficient optimization via a GA. We evaluated our method on problems with controlled and varying degrees of noise, demonstrating lower or comparable interpolation and extrapolation errors compared to two GP-based methods, three neural SR methods, and a hybrid approach. Unlike them, our approach consistently learned expressions that matched the original mathematical structure.

View on arXiv PDF

Similar