LG AI CL DBFeb 22, 2024

OmniPred: Language Models as Universal Regressors

Xingyou Song, Oscar Li, Chansoo Lee, Bangding Yang, Daiyi Peng, Sagi Perel, Yutian Chen

arXiv:2402.14547v625.141 citationsh-index: 14Has CodeTrans. Mach. Learn. Res.

Originality Highly original

AI Analysis

This work addresses the limitation of task-specific regression methods for researchers and practitioners in machine learning and optimization.

The paper tackles the problem of creating a universal regression framework applicable across arbitrary data formats, and demonstrates that language models trained on large-scale blackbox optimization data can achieve very precise numerical regression and significantly outperform traditional regression models.

Regression is a powerful tool to accurately predict the outcome metric of a system given a set of parameters, but has traditionally been restricted to methods which are only applicable to a specific task. In this paper, we propose OmniPred, a framework for training language models as universal end-to-end regressors over $(x,y)$ data from arbitrary formats. Using data sourced from Google Vizier, one of the largest proprietary blackbox optimization databases in the world, our extensive experiments demonstrate that language models are capable of very precise numerical regression using only textual representations of mathematical parameters and values, and if given the opportunity to train at scale over multiple tasks, can significantly outperform traditional regression models.

View on arXiv PDF Code

Similar