CLAINov 9, 2024

IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

arXiv:2411.06208v330 citationsh-index: 19ACL
Originality Incremental advance
AI Analysis

This addresses the challenge of complex instruction following for LLM-based agents and applications, representing an incremental advancement in alignment techniques.

The paper tackles the problem of improving large language models' ability to follow complex instructions by introducing TRACE, a benchmark with 120K training and 1K evaluation data, and proposing IOPO, an alignment method that considers input and output preferences, resulting in improvements of up to 8.15% on in-domain and 6.29% on out-of-domain data compared to baseline methods.

In the realm of large language models (LLMs), the ability of models to accurately follow instructions is paramount as more agents and applications leverage LLMs for construction, where the complexity of instructions are rapidly increasing. However, on the one hand, there is only a certain amount of complex instruction evaluation data; on the other hand, there are no dedicated algorithms to improve the ability to follow complex instructions. To this end, this paper introduces TRACE, a benchmark for improving and evaluating the complex instructionfollowing ability, which consists of 120K training data and 1K evaluation data. Furthermore, we propose IOPO (Input-Output Preference Optimization) alignment method which takes both input and output preference pairs into consideration, where LLMs not only rapidly align with response preferences but also meticulously explore the instruction preferences. Extensive experiments on both in-domain and outof-domain datasets confirm the effectiveness of IOPO, showing 8.15%, 2.18% improvements on in-domain data and 6.29%, 3.13% on outof-domain data compared to SFT and DPO respectively.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes