CLMay 22, 2023

Nearest Neighbor Machine Translation is Meta-Optimizer on Output Projection Layer

Ruize Gao, Zhirui Zhang, Yichao Du, Lemao Liu, Rui Wang

arXiv:2305.13034v221.1132 citationsHas Code

Originality Incremental advance

AI Analysis

This provides insights for researchers in machine translation on efficient domain adaptation methods, though it is incremental as it analyzes an existing technique.

The paper investigates why kNN-MT works well for domain adaptation in machine translation, finding it acts as an implicit gradient descent optimizer on the output layer, and shows it matches fine-tuning on in-domain data while improving out-of-domain performance.

Nearest Neighbor Machine Translation ($k$NN-MT) has achieved great success in domain adaptation tasks by integrating pre-trained Neural Machine Translation (NMT) models with domain-specific token-level retrieval. However, the reasons underlying its success have not been thoroughly investigated. In this paper, we comprehensively analyze $k$NN-MT through theoretical and empirical studies. Initially, we provide new insights into the working mechanism of $k$NN-MT as an efficient technique to implicitly execute gradient descent on the output projection layer of NMT, indicating that it is a specific case of model fine-tuning. Subsequently, we conduct multi-domain experiments and word-level analysis to examine the differences in performance between $k$NN-MT and entire-model fine-tuning. Our findings suggest that: (1) Incorporating $k$NN-MT with adapters yields comparable translation performance to fine-tuning on in-domain test sets, while achieving better performance on out-of-domain test sets; (2) Fine-tuning significantly outperforms $k$NN-MT on the recall of in-domain low-frequency words, but this gap could be bridged by optimizing the context representations with additional adapter layers.

View on arXiv PDF Code

Similar