LG NE SCMay 12, 2025

Symbolic Regression with Multimodal Large Language Models and Kolmogorov Arnold Networks

Thomas R. Harvey, Fabian Ruehle, Kit Fraser-Taliente, James Halverson

arXiv:2505.07956v19.43 citationsh-index: 24

Originality Incremental advance

AI Analysis

This is an incremental improvement for symbolic regression researchers, offering a method that eliminates the need for specifying function sets and extends to multivariate cases via KANs.

The paper tackles symbolic regression by using vision-capable LLMs to propose function ansätze from plots, combined with genetic algorithms and KANs for parameter fitting and multivariate extension, achieving results without predefined function sets.

We present a novel approach to symbolic regression using vision-capable large language models (LLMs) and the ideas behind Google DeepMind's Funsearch. The LLM is given a plot of a univariate function and tasked with proposing an ansatz for that function. The free parameters of the ansatz are fitted using standard numerical optimisers, and a collection of such ansätze make up the population of a genetic algorithm. Unlike other symbolic regression techniques, our method does not require the specification of a set of functions to be used in regression, but with appropriate prompt engineering, we can arbitrarily condition the generative step. By using Kolmogorov Arnold Networks (KANs), we demonstrate that ``univariate is all you need'' for symbolic regression, and extend this method to multivariate functions by learning the univariate function on each edge of a trained KAN. The combined expression is then simplified by further processing with a language model.

View on arXiv PDF

Similar