CLAIJan 9, 2025

Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models

arXiv:2501.04945v49 citationsh-index: 22Has CodeACL
AI Analysis

This work addresses a specific, incremental improvement in LLM instruction-following for tasks involving soft constraints, which is relevant for applications requiring nuanced control over model outputs.

The paper tackles the problem of enhancing large language models' ability to follow soft constraints, such as instructions with multiple constraints, by developing a pipeline for automatic dataset construction and a curriculum learning training method based on constraint quantity, resulting in improved performance as evaluated experimentally.

It is crucial for large language models (LLMs) to follow instructions that involve multiple constraints. However, it is an unexplored area to enhance LLMs' ability to follow soft constraints. To bridge the gap, we initially design a pipeline to construct datasets with high-quality outputs automatically. Additionally, to fully utilize the positive and negative samples generated during the data construction process, we choose Direct Preference Optimization (DPO) as the training method. Furthermore, taking into account the difficulty of soft constraints indicated by the number of constraints, we design a curriculum learning training paradigm based on the constraint quantity. We experimentally evaluate the effectiveness of our methods in improving LLMs' soft constraint following ability and analyze the factors driving the improvements.The datasets and code are publicly available at https://github.com/Rainier-rq/FollowSoftConstraint.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes