CLOct 27, 2019

Multitask Learning For Different Subword Segmentations In Neural Machine Translation

Tejas Srinivasan, Ramon Sanabria, Florian Metze

arXiv:1910.12368v125.6645 citations

Originality Incremental advance

AI Analysis

This addresses the segmentation trade-off for NMT practitioners, offering a flexible solution without extensive search, though it is incremental in combining existing multitask ideas.

The paper tackles the problem of selecting optimal subword segmentation in Neural Machine Translation by proposing Block Multitask Learning (BMTL), which predicts multiple granularities simultaneously, resulting in improvements of up to 1.7 BLEU points over baselines on IWSLT datasets.

In Neural Machine Translation (NMT) the usage of subwords and characters as source and target units offers a simple and flexible solution for translation of rare and unseen words. However, selecting the optimal subword segmentation involves a trade-off between expressiveness and flexibility, and is language and dataset-dependent. We present Block Multitask Learning (BMTL), a novel NMT architecture that predicts multiple targets of different granularities simultaneously, removing the need to search for the optimal segmentation strategy. Our multi-task model exhibits improvements of up to 1.7 BLEU points on each decoder over single-task baseline models with the same number of parameters on datasets from two language pairs of IWSLT15 and one from IWSLT19. The multiple hypotheses generated at different granularities can be combined as a post-processing step to give better translations, which improves over hypothesis combination from baseline models while using substantially fewer parameters.

View on arXiv PDF

Similar