LGAICLJan 31, 2025

Should You Use Your Large Language Model to Explore or Exploit?

arXiv:2502.00225v27 citationsh-index: 38
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of using LLMs for decision-making in bandit tasks, but it is incremental as it shows limited practical gains compared to simpler methods.

The paper evaluated large language models (LLMs) in exploration-exploitation tradeoffs for decision-making agents, finding they often struggle with exploitation but can help explore large action spaces, though they perform worse than linear regression in small-scale tasks.

We evaluate the ability of the current generation of large language models (LLMs) to help a decision-making agent facing an exploration-exploitation tradeoff. We use LLMs to explore and exploit in silos in various (contextual) bandit tasks. We find that while the current LLMs often struggle to exploit, in-context mitigations may be used to substantially improve performance for small-scale tasks. However even then, LLMs perform worse than a simple linear regression. On the other hand, we find that LLMs do help at exploring large action spaces with inherent semantics, by suggesting suitable candidates to explore.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes