CLMay 3, 2022

Mixed-effects transformers for hierarchical adaptation

arXiv:2205.01749v2292 citationsh-index: 75
AI Analysis

This addresses domain adaptation for language models when contexts are sparse or extra-textual, representing an incremental improvement over existing prefix-tuning methods.

The paper tackles the problem of language models failing to adapt to sparse, out-of-sample, or extra-textual contexts like time, location, or author identity, by introducing mixed-effects transformers (MET) that learn hierarchically-structured prefixes. The result shows that MET efficiently adapts to novel contexts with minimal data while generalizing to unseen contexts on domain-adaptation benchmarks.

Language use differs dramatically from context to context. To some degree, modern language models like GPT-3 are able to account for such variance by conditioning on a string of previous input text, or prompt. Yet prompting is ineffective when contexts are sparse, out-of-sample, or extra-textual; for instance, accounting for when and where the text was produced or who produced it. In this paper, we introduce the mixed-effects transformer (MET), a novel approach for learning hierarchically-structured prefixes -- lightweight modules prepended to the input -- to account for structured variation. Specifically, we show how the popular class of mixed-effects models may be extended to transformer-based architectures using a regularized prefix-tuning procedure with dropout. We evaluate this approach on several domain-adaptation benchmarks, finding that it efficiently adapts to novel contexts with minimal data while still effectively generalizing to unseen contexts.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes