CLAIIRApr 15, 2021

Towards Robust Neural Retrieval Models with Synthetic Pre-Training

arXiv:2104.07800v115 citations
Originality Incremental advance
AI Analysis

This work addresses robustness issues in neural IR for researchers and practitioners, but it is incremental as it builds on existing methods with synthetic data.

The paper tackled the problem of improving the robustness of neural information retrieval models across different scenarios, including zero-shot settings, by using synthetic training examples generated with a sequence-to-sequence generator, resulting in improved retrieval performance on five test sets.

Recent work has shown that commonly available machine reading comprehension (MRC) datasets can be used to train high-performance neural information retrieval (IR) systems. However, the evaluation of neural IR has so far been limited to standard supervised learning settings, where they have outperformed traditional term matching baselines. We conduct in-domain and out-of-domain evaluations of neural IR, and seek to improve its robustness across different scenarios, including zero-shot settings. We show that synthetic training examples generated using a sequence-to-sequence generator can be effective towards this goal: in our experiments, pre-training with synthetic examples improves retrieval performance in both in-domain and out-of-domain evaluation on five different test sets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes