Instruction Set and Language for Symbolic Regression
This addresses a fundamental bottleneck in symbolic regression for researchers and practitioners by reducing wasted computational resources.
The paper tackles the problem of structural redundancy in symbolic regression, where multiple representations encode the same expression, and presents IsalSR, a framework that collapses these into a single canonical form to improve efficiency.
A fundamental but largely unaddressed obstacle in Symbolic regression (SR) is structural redundancy: every expression DAG with admits many distinct node-numbering schemes that all encode the same expression, each occupying a separate point in the search space and consuming fitness evaluations without adding diversity. We present IsalSR (Instruction Set and Language for Symbolic Regression), a representation framework that encodes expression DAGs as strings over a compact two-tier alphabet and computes a pruned canonical string -- a complete labeled-DAG isomorphism invariant -- that collapses all the equivalent representations into a single canonical form.