CLAILGJun 22, 2024

RuleR: Improving LLM Controllability by Rule-based Data Recycling

arXiv:2406.15938v415 citations
Originality Incremental advance
AI Analysis

This addresses the need for more controllable LLMs to boost performance and user experience, offering a cost-effective alternative to human or proprietary LLM-based dataset curation, though it is incremental in nature.

The paper tackles the problem of improving controllability in large language models (LLMs) by proposing RuleR, a rule-based data augmentation method that recycles existing data to create new training tasks, resulting in enhanced controllability while preserving general instruction-following capabilities.

Large language models (LLMs) still lack delicate controllability over their responses, which is critical to enhancing their performance and the user experience. However, curating supervised fine-tuning (SFT) datasets to improve LLM controllability usually relies on human experts or proprietary LLMs, which requires additional costs. To bridge this gap, we propose Rule-based Data Recycling (RuleR), a data augmentation method incorporating multiple constraints into the original data samples according to predefined rules, which creates new training tasks to consolidate the controllability of LLMs. Instead of creating new data from scratch, RuleR "recycles" existing data by simply applying rule-based edits to their responses and appending the rule-instructions in their original instructions. Experimental results demonstrate RuleR's effectiveness in improving LLM controllability while maintaining general instruction-following capabilities.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes