CLOct 28, 2022

Domain Adaptation of Machine Translation with Crowdworkers

Makoto Morishita, Jun Suzuki, Masaaki Nagata

arXiv:2210.15861v123.8284 citationsh-index: 33

Originality Incremental advance

AI Analysis

This addresses the need for high-quality domain-specific machine translation in domains with limited data, though it is an incremental improvement over existing adaptation methods.

The paper tackles the problem of poor machine translation performance when no in-domain data is available by proposing a framework that uses crowdworkers to efficiently collect parallel sentences from the web for domain adaptation. The results show that the domain-adapted model improved BLEU scores by an average of +7.8 points across five domains, with a maximum gain of +19.7 points compared to a general-purpose model.

Although a machine translation model trained with a large in-domain parallel corpus achieves remarkable results, it still works poorly when no in-domain data are available. This situation restricts the applicability of machine translation when the target domain's data are limited. However, there is great demand for high-quality domain-specific machine translation models for many domains. We propose a framework that efficiently and effectively collects parallel sentences in a target domain from the web with the help of crowdworkers. With the collected parallel data, we can quickly adapt a machine translation model to the target domain. Our experiments show that the proposed method can collect target-domain parallel data over a few days at a reasonable cost. We tested it with five domains, and the domain-adapted model improved the BLEU scores to +19.7 by an average of +7.8 points compared to a general-purpose translation model.

View on arXiv PDF

Similar