IRCLApr 25, 2022

C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval

arXiv:2204.11989v123 citationsh-index: 48
Originality Incremental advance
AI Analysis

This work addresses cross-language retrieval for multilingual applications, but it is incremental as it builds on existing pretraining methods.

The paper tackled the challenge of designing auxiliary tasks for cross-language ad-hoc retrieval by using comparable Wikipedia articles to continue pretraining multilingual models, resulting in improved retrieval effectiveness.

Pretrained language models have improved effectiveness on numerous tasks, including ad-hoc retrieval. Recent work has shown that continuing to pretrain a language model with auxiliary objectives before fine-tuning on the retrieval task can further improve retrieval effectiveness. Unlike monolingual retrieval, designing an appropriate auxiliary task for cross-language mappings is challenging. To address this challenge, we use comparable Wikipedia articles in different languages to further pretrain off-the-shelf multilingual pretrained models before fine-tuning on the retrieval task. We show that our approach yields improvements in retrieval effectiveness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes