CLAug 23, 2024

In-Context Learning with Reinforcement Learning for Incomplete Utterance Rewriting

Peking U
arXiv:2408.13028v1h-index: 3
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving analogy ability in LLMs for tasks like incomplete utterance rewriting, offering an incremental advancement in example selection techniques.

The paper tackles the problem of selecting examples for in-context learning in large language models by proposing a reinforcement learning framework that uses LLM feedback to optimize the selector, resulting in significant performance improvements over existing methods and advantages over supervised fine-tuning in few-shot settings.

In-context learning (ICL) of large language models (LLMs) has attracted increasing attention in the community where LLMs make predictions only based on instructions augmented with a few examples. Existing example selection methods for ICL utilize sparse or dense retrievers and derive effective performance. However, these methods do not utilize direct feedback of LLM to train the retriever and the examples selected can not necessarily improve the analogy ability of LLM. To tackle this, we propose our policy-based reinforcement learning framework for example selection (RLS), which consists of a language model (LM) selector and an LLM generator. The LM selector encodes the candidate examples into dense representations and selects the top-k examples into the demonstration for LLM. The outputs of LLM are adopted to compute the reward and policy gradient to optimize the LM selector. We conduct experiments on different datasets and significantly outperform existing example selection methods. Moreover, our approach shows advantages over supervised finetuning (SFT) models in few shot setting. Further experiments show the balance of abundance and the similarity with the test case of examples is important for ICL performance of LLM.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes