CLAISep 15, 2023

How Transferable are Attribute Controllers on Pretrained Multilingual Translation Models?

arXiv:2309.08565v3104 citationsh-index: 9Has Code
Originality Incremental advance
AI Analysis

This work addresses the data scarcity bottleneck for democratizing customizable machine translation to low-resource languages, though it is incremental as it builds on existing controllable generation techniques.

The paper tackles the problem of customizing machine translation models for desired attributes like formality in low-resource languages without annotated data by transferring attribute controllers to a pretrained multilingual model, showing that inference-time control closes the gap with finetuning in zero-shot conditions and demonstrates stronger domain robustness.

Customizing machine translation models to comply with desired attributes (e.g., formality or grammatical gender) is a well-studied topic. However, most current approaches rely on (semi-)supervised data with attribute annotations. This data scarcity bottlenecks democratizing such customization possibilities to a wider range of languages, particularly lower-resource ones. This gap is out of sync with recent progress in pretrained massively multilingual translation models. In response, we transfer the attribute controlling capabilities to languages without attribute-annotated data with an NLLB-200 model as a foundation. Inspired by techniques from controllable generation, we employ a gradient-based inference-time controller to steer the pretrained model. The controller transfers well to zero-shot conditions, as it operates on pretrained multilingual representations and is attribute -- rather than language-specific. With a comprehensive comparison to finetuning-based control, we demonstrate that, despite finetuning's clear dominance in supervised settings, the gap to inference-time control closes when moving to zero-shot conditions, especially with new and distant target languages. The latter also shows stronger domain robustness. We further show that our inference-time control complements finetuning. A human evaluation on a real low-resource language, Bengali, confirms our findings. Our code is https://github.com/dannigt/attribute-controller-transfer

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes