CLDec 17, 2022

Better Datastore, Better Translation: Generating Datastores from Pre-Trained Models for Nearest Neural Machine Translation

ByteDance
arXiv:2212.08822v12 citationsh-index: 16
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in machine translation for researchers and practitioners by improving kNNMT efficiency, though it is incremental as it builds on existing kNNMT methods.

The paper tackles the problem of poor retrieval accuracy in Nearest Neighbor Machine Translation (kNNMT) when using suboptimal NMT model representations, proposing PRED to generate datastores from pre-trained models and align representations, resulting in improved translation performance with gains of up to 2.5 BLEU points on benchmarks like WMT17 English-Chinese.

Nearest Neighbor Machine Translation (kNNMT) is a simple and effective method of augmenting neural machine translation (NMT) with a token-level nearest neighbor retrieval mechanism. The effectiveness of kNNMT directly depends on the quality of retrieved neighbors. However, original kNNMT builds datastores based on representations from NMT models, which would result in poor retrieval accuracy when NMT models are not good enough, leading to sub-optimal translation performance. In this paper, we propose PRED, a framework that leverages Pre-trained models for Datastores in kNN-MT. Better representations from pre-trained models allow us to build datastores of better quality. We also design a novel contrastive alignment objective to mitigate the representation gap between the NMT model and pre-trained models, enabling the NMT model to retrieve from better datastores. We conduct extensive experiments on both bilingual and multilingual translation benchmarks, including WMT17 English $\leftrightarrow$ Chinese, WMT14 English $\leftrightarrow$ German, IWSLT14 German $\leftrightarrow$ English, and IWSLT14 multilingual datasets. Empirical results demonstrate the effectiveness of PRED.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes