CLFeb 20, 2021

Multilingual Answer Sentence Reranking via Automatically Translated Data

arXiv:2102.10250v11.24 citations

Originality Incremental advance

AI Analysis

This reduces the complexity and resource requirements for multilingual question answering systems, making them more accessible for languages with limited data.

The paper tackles the problem of building multilingual Answer Sentence Selection models by transferring training data from resource-rich languages like English to others via translation, showing that fine-tuning Transformer models with translated data yields performance within 3% of the state-of-the-art English model.

We present a study on the design of multilingual Answer Sentence Selection (AS2) models, which are a core component of modern Question Answering (QA) systems. The main idea is to transfer data, created from one resource rich language, e.g., English, to other languages, less rich in terms of resources. The main findings of this paper are: (i) the training data for AS2 translated into a target language can be used to effectively fine-tune a Transformer-based model for that language; (ii) one multilingual Transformer model it is enough to rank answers in multiple languages; and (iii) mixed-language question/answer pairs can be used to fine-tune models to select answers from any language, where the input question is just in one language. This highly reduces the complexity and technical requirement of a multilingual QA system. Our experiments validate the findings above, showing a modest drop, at most 3%, with respect to the state-of-the-art English model.

View on arXiv PDF

Similar