Should You Use Your Large Language Model to Explore or Exploit?
This work addresses the problem of using LLMs for decision-making in bandit tasks, but it is incremental as it shows limited practical gains compared to simpler methods.
The paper evaluated large language models (LLMs) in exploration-exploitation tradeoffs for decision-making agents, finding they often struggle with exploitation but can help explore large action spaces, though they perform worse than linear regression in small-scale tasks.
We evaluate the ability of the current generation of large language models (LLMs) to help a decision-making agent facing an exploration-exploitation tradeoff. We use LLMs to explore and exploit in silos in various (contextual) bandit tasks. We find that while the current LLMs often struggle to exploit, in-context mitigations may be used to substantially improve performance for small-scale tasks. However even then, LLMs perform worse than a simple linear regression. On the other hand, we find that LLMs do help at exploring large action spaces with inherent semantics, by suggesting suitable candidates to explore.