CLJun 1, 2023

Preference-grounded Token-level Guidance for Language Model Fine-tuning

Apple
arXiv:2306.00398v336 citationsh-index: 38
Originality Incremental advance
AI Analysis

This addresses the alignment of language models with preferences, a key challenge in natural language generation, though it appears incremental as it builds on existing preference learning frameworks.

The paper tackles the granularity mismatch between sequence-level preferences and token-level language model training by developing an iterative process that grounds preferences into token-level guidance and improves the model with this guidance. In experiments, the method performs competitively on discrete-prompt generation and text summarization tasks.

Aligning language models (LMs) with preferences is an important problem in natural language generation. A key challenge is that preferences are typically provided at the sequence level while LM training and generation both occur at the token level. There is, therefore, a granularity mismatch between the preference and the LM training losses, which may complicate the learning problem. In this paper, we address this issue by developing an alternate training process, where we iterate between grounding the sequence-level preference into token-level training guidance, and improving the LM with the learned guidance. For guidance learning, we design a framework that extends the pairwise-preference learning in imitation learning to both variable-length LM generation and the utilization of the preference among multiple generations. For LM training, based on the amount of supervised data, we present two minimalist learning objectives that utilize the learned guidance. In experiments, our method performs competitively on two distinct representative LM tasks -- discrete-prompt generation and text summarization.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes