Investigating Post-pretraining Representation Alignment for Cross-Lingual Question Answering
This work addresses the challenge of making QA systems accessible across languages, which is incremental as it builds on existing multilingual models with a fine-tuning step.
The paper tackles the problem of cross-lingual question answering by investigating multilingually pre-trained language models, finding that explicit representation alignment through post-hoc fine-tuning generally improves performance, with results including specific gains such as improved accuracy on released datasets.
Human knowledge is collectively encoded in the roughly 6500 languages spoken around the world, but it is not distributed equally across languages. Hence, for information-seeking question answering (QA) systems to adequately serve speakers of all languages, they need to operate cross-lingually. In this work we investigate the capabilities of multilingually pre-trained language models on cross-lingual QA. We find that explicitly aligning the representations across languages with a post-hoc fine-tuning step generally leads to improved performance. We additionally investigate the effect of data size as well as the language choice in this fine-tuning step, also releasing a dataset for evaluating cross-lingual QA systems. Code and dataset are publicly available here: https://github.com/ffaisal93/aligned_qa