CLMay 25, 2023

Cross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages

arXiv:2305.16302v1223 citations
Originality Incremental advance
AI Analysis

This enables stronger AS2 models for low-resource languages, addressing a gap in multilingual NLP, though it is incremental as it builds on existing distillation and translation methods.

The paper tackles the problem of Answer Sentence Selection (AS2) for low-resource languages by proposing Cross-Lingual Knowledge Distillation (CLKD) from an English teacher, achieving performance that outperforms or rivals supervised fine-tuning with labeled data.

While impressive performance has been achieved on the task of Answer Sentence Selection (AS2) for English, the same does not hold for languages that lack large labeled datasets. In this work, we propose Cross-Lingual Knowledge Distillation (CLKD) from a strong English AS2 teacher as a method to train AS2 models for low-resource languages in the tasks without the need of labeled data for the target language. To evaluate our method, we introduce 1) Xtr-WikiQA, a translation-based WikiQA dataset for 9 additional languages, and 2) TyDi-AS2, a multilingual AS2 dataset with over 70K questions spanning 8 typologically diverse languages. We conduct extensive experiments on Xtr-WikiQA and TyDi-AS2 with multiple teachers, diverse monolingual and multilingual pretrained language models (PLMs) as students, and both monolingual and multilingual training. The results demonstrate that CLKD either outperforms or rivals even supervised fine-tuning with the same amount of labeled data and a combination of machine translation and the teacher model. Our method can potentially enable stronger AS2 models for low-resource languages, while TyDi-AS2 can serve as the largest multilingual AS2 dataset for further studies in the research community.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes