CLIRJul 7, 2022

Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling

arXiv:2207.03030v114 citationsh-index: 41
Originality Incremental advance
AI Analysis

This work addresses the challenge of training retrieval-augmented generation models efficiently across multiple knowledge-intensive tasks, offering incremental improvements for researchers and practitioners in natural language processing.

The paper tackles the problem of multi-task training for retrieval-augmented generation models on knowledge-intensive tasks by cleaning the training set using relevance sampling based on query-answer connections to a knowledge base. The approach improves competitive baselines on two imbalanced tasks, shows no significant regression on others, and achieves state-of-the-art results in five out of seven KILT benchmark tasks.

This paper studies multi-task training of retrieval-augmented generation models for knowledge-intensive tasks. We propose to clean the training set by utilizing a distinct property of knowledge-intensive generation: The connection of query-answer pairs to items in the knowledge base. We filter training examples via a threshold of confidence on the relevance labels, whether a pair is answerable by the knowledge base or not. We train a single Fusion-in-Decoder (FiD) generator on seven combined tasks of the KILT benchmark. The experimental results suggest that our simple yet effective approach substantially improves competitive baselines on two strongly imbalanced tasks; and shows either smaller improvements or no significant regression on the remaining tasks. Furthermore, we demonstrate our multi-task training with relevance label sampling scales well with increased model capacity and achieves state-of-the-art results in five out of seven KILT tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes