LG AI NE SCSep 14, 2024

Symbolic Regression with a Learned Concept Library

Arya Grayeli, Atharva Sehgal, Omar Costilla-Reyes, Miles Cranmer, Swarat Chaudhuri

arXiv:2409.09359v331.570 citationsh-index: 37Has Code

Originality Highly original

AI Analysis

This work addresses the problem of finding compact programmatic hypotheses in symbolic regression for researchers and practitioners, offering a novel hybrid approach that combines evolutionary algorithms with LLM guidance.

The authors tackled symbolic regression by enhancing genetic algorithms with a learned library of abstract concepts, using zero-shot LLM queries to discover and evolve hypotheses, and achieved substantial performance improvements over state-of-the-art methods on benchmarks like the Feynman equations and synthetic tasks, including discovering a novel scaling law for LLMs.

We present a novel method for symbolic regression (SR), the task of searching for compact programmatic hypotheses that best explain a dataset. The problem is commonly solved using genetic algorithms; we show that we can enhance such methods by inducing a library of abstract textual concepts. Our algorithm, called LaSR, uses zero-shot queries to a large language model (LLM) to discover and evolve concepts occurring in known high-performing hypotheses. We discover new hypotheses using a mix of standard evolutionary steps and LLM-guided steps (obtained through zero-shot LLM queries) conditioned on discovered concepts. Once discovered, hypotheses are used in a new round of concept abstraction and evolution. We validate LaSR on the Feynman equations, a popular SR benchmark, as well as a set of synthetic tasks. On these benchmarks, LaSR substantially outperforms a variety of state-of-the-art SR approaches based on deep learning and evolutionary algorithms. Moreover, we show that LaSR can be used to discover a novel and powerful scaling law for LLMs.

View on arXiv PDF Code

Similar