CLJul 5, 2024

On the Low-Rank Parametrization of Reward Models for Controlled Language Generation

arXiv:2407.04615v43 citationsh-index: 114
Originality Incremental advance
AI Analysis

This work addresses the problem of computational overhead in controlled language generation for AI practitioners, offering an incremental improvement in efficiency.

The paper tackled the computational inefficiency of reward-augmented decoding for controlled language generation by proposing a low-rank parametrization of the expert model, achieving performance on par with higher-rank methods on detoxification and sentiment control tasks while reducing computational cost to a single reward model call per token.

Language models trained on large amounts of data are known to produce inappropriate content in some cases and require careful tuning to be used in the real world. We revisit an effective and modular approach for controllability of the language models, when an external expert model guides the decoding. Particularly, we zoom in into the parametrization choice of an external expert, highlighting the difference between low-rank and higher-rank parametrizations. Higher-rank experts are designed to support high flexibility when representing the rewards, leading to higher computational costs during decoding. However, we demonstrate that they might not use their full flexibility. By analyzing the recently proposed reward-augmented decoding approach (RAD), which uses a higher-rank expert model, we introduce a simpler but more efficient low-rank parametrization of the expert model enabling fast and effective guided decoding. We empirically show that the low-rank RAD performs on par with the more flexible RAD on a detoxification and a sentiment control task, while requiring only a single reward model call per generated token.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes