CL LG GNMay 13, 2025

Revealing economic facts: LLMs know more than they say

Marcus Buckmann, Quynh Anh Nguyen, Edward Hill

arXiv:2505.08662v13 citationsh-index: 3Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of extracting richer economic information from LLMs for tasks like data imputation, though it is incremental in applying existing methods to new data types.

The study tackled the problem of estimating economic and financial statistics by using hidden states from large language models (LLMs), showing that a linear model trained on these states outperformed the models' text outputs, with only a few dozen labeled examples needed for training.

We investigate whether the hidden states of large language models (LLMs) can be used to estimate and impute economic and financial statistics. Focusing on county-level (e.g. unemployment) and firm-level (e.g. total assets) variables, we show that a simple linear model trained on the hidden states of open-source LLMs outperforms the models' text outputs. This suggests that hidden states capture richer economic information than the responses of the LLMs reveal directly. A learning curve analysis indicates that only a few dozen labelled examples are sufficient for training. We also propose a transfer learning method that improves estimation accuracy without requiring any labelled data for the target variable. Finally, we demonstrate the practical utility of hidden-state representations in super-resolution and data imputation tasks.

View on arXiv PDF

Similar