LGAICLMEMLFeb 1, 2024

Efficient Exploration for LLMs

arXiv:2402.00396v248 citationsh-index: 12ICML
AI Analysis

This work addresses the challenge of reducing the cost and effort of human feedback collection for large language models, representing an incremental improvement in exploration methods.

The paper tackles the problem of efficiently gathering human feedback to improve large language models by using an agent that sequentially generates queries with double Thompson sampling and epistemic neural networks for uncertainty estimation, resulting in high performance with far fewer queries.

We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models. In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received. Our best-performing agent generates queries using double Thompson sampling, with uncertainty represented by an epistemic neural network. Our results demonstrate that efficient exploration enables high levels of performance with far fewer queries. Further, both uncertainty estimation and the choice of exploration scheme play critical roles.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes