Decoding-based Regression
This work provides a theoretical foundation and practical demonstration for using language models in regression tasks, offering flexibility for applications like density estimation.
The paper tackles the problem of using language models for numeric regression by representing predictions as decoded strings, showing that decoder-based heads perform as well as standard regression methods on benchmark tasks and can capture smooth numeric distributions.
Language models have recently been shown capable of performing regression wherein numeric predictions are represented as decoded strings. In this work, we provide theoretical grounds for this capability and furthermore investigate the utility of causal sequence decoding models as numeric regression heads given any feature representation. We find that, despite being trained in the usual way - for next-token prediction via cross-entropy loss - decoder-based heads are as performant as standard pointwise heads when benchmarked over standard regression tasks, while being flexible enough to capture smooth numeric distributions, such as in the task of density estimation.