CLAILGAug 23, 2023

Prompt-Based Length Controlled Generation with Reinforcement Learning

arXiv:2308.12030v221 citationsh-index: 35
Originality Incremental advance
AI Analysis

This addresses the need for efficient and precise length control in LLMs for real-world applications like generating answers or essays of desired lengths, though it is incremental as it builds on existing prompt and RL techniques.

The paper tackles the problem of length-controlled generation in large language models (LLMs) by proposing a prompt-based method using reinforcement learning, which significantly improves accuracy on summarization tasks with datasets like CNNDM and NYT.

Large language models (LLMs) like ChatGPT and GPT-4 have attracted great attention given their surprising performance on a wide range of NLP tasks. Length controlled generation of LLMs emerges as an important topic, which enables users to fully leverage the capability of LLMs in more real-world scenarios like generating a proper answer or essay of a desired length. In addition, the autoregressive generation in LLMs is extremely time-consuming, while the ability of controlling this generated length can reduce the inference cost by limiting the length. Therefore, we propose a prompt-based length control method to achieve high-accuracy length controlled generation. In particular, we adopt reinforcement learning with the reward signal given by either trainable or rule-based reward models, which further enhances the length-control ability of LLMs by rewarding outputs that follows pre-defined control instruction. To enable rule-based inference, we also introduce standard prompt extractor to collect the standard control information from users' input. Experiments show that our method significantly improves the accuracy of prompt-based length control for summarization task on popular datasets like CNNDM and NYT. Both the standard prompt extractor and the RL-tuned model have show strong generalization ability to unseen control prompt templates.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes