CLOct 25, 2023

CUNI Submission to MRL 2023 Shared Task on Multi-lingual Multi-task Information Retrieval

arXiv:2310.16528v1131 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses information retrieval for under-represented languages, but it is incremental as it applies existing methods to a new shared task.

The authors tackled the MRL 2023 Shared Task on Multi-lingual Multi-task Information Retrieval by developing a system for named entity recognition and question answering in under-represented languages using a translate-test approach, but their finetuned models did not outperform baselines due to domain mismatch.

We present the Charles University system for the MRL~2023 Shared Task on Multi-lingual Multi-task Information Retrieval. The goal of the shared task was to develop systems for named entity recognition and question answering in several under-represented languages. Our solutions to both subtasks rely on the translate-test approach. We first translate the unlabeled examples into English using a multilingual machine translation model. Then, we run inference on the translated data using a strong task-specific model. Finally, we project the labeled data back into the original language. To keep the inferred tags on the correct positions in the original language, we propose a method based on scoring the candidate positions using a label-sensitive translation model. In both settings, we experiment with finetuning the classification models on the translated data. However, due to a domain mismatch between the development data and the shared task validation and test sets, the finetuned models could not outperform our baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes