LGFeb 5, 2025

SyMANTIC: An Efficient Symbolic Regression Method for Interpretable and Parsimonious Model Discovery in Science and Beyond

arXiv:2502.03367v113 citationsh-index: 29Has CodeIndustrial & Engineering Chemistry Research
Originality Incremental advance
AI Analysis

This work addresses efficiency and scalability issues in symbolic regression for researchers and practitioners in science and machine learning, offering an incremental improvement over prior methods.

The paper tackles the challenges of symbolic regression, such as high computational cost and poor scalability, by introducing SyMANTIC, a novel algorithm that efficiently discovers interpretable mathematical models from data, achieving similar or better accuracy at a fraction of the cost of existing methods.

Symbolic regression (SR) is an emerging branch of machine learning focused on discovering simple and interpretable mathematical expressions from data. Although a wide-variety of SR methods have been developed, they often face challenges such as high computational cost, poor scalability with respect to the number of input dimensions, fragility to noise, and an inability to balance accuracy and complexity. This work introduces SyMANTIC, a novel SR algorithm that addresses these challenges. SyMANTIC efficiently identifies (potentially several) low-dimensional descriptors from a large set of candidates (from $\sim 10^5$ to $\sim 10^{10}$ or more) through a unique combination of mutual information-based feature selection, adaptive feature expansion, and recursively applied $\ell_0$-based sparse regression. In addition, it employs an information-theoretic measure to produce an approximate set of Pareto-optimal equations, each offering the best-found accuracy for a given complexity. Furthermore, our open-source implementation of SyMANTIC, built on the PyTorch ecosystem, facilitates easy installation and GPU acceleration. We demonstrate the effectiveness of SyMANTIC across a range of problems, including synthetic examples, scientific benchmarks, real-world material property predictions, and chaotic dynamical system identification from small datasets. Extensive comparisons show that SyMANTIC uncovers similar or more accurate models at a fraction of the cost of existing SR methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes