LG AI CLJan 31, 2025

Should You Use Your Large Language Model to Explore or Exploit?

arXiv:2502.00225v217.97 citationsh-index: 38

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of using LLMs for decision-making in bandit tasks, but it is incremental as it shows limited practical gains compared to simpler methods.

The paper evaluated large language models (LLMs) in exploration-exploitation tradeoffs for decision-making agents, finding they often struggle with exploitation but can help explore large action spaces, though they perform worse than linear regression in small-scale tasks.

We evaluate the ability of the current generation of large language models (LLMs) to help a decision-making agent facing an exploration-exploitation tradeoff. We use LLMs to explore and exploit in silos in various (contextual) bandit tasks. We find that while the current LLMs often struggle to exploit, in-context mitigations may be used to substantially improve performance for small-scale tasks. However even then, LLMs perform worse than a simple linear regression. On the other hand, we find that LLMs do help at exploring large action spaces with inherent semantics, by suggesting suitable candidates to explore.

View on arXiv PDF

Similar