CLIRDec 15, 2022

MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are Better Dense Retrievers

Microsoft
arXiv:2212.07841v219 citationsh-index: 70Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of improving dense retrieval performance for researchers and practitioners by integrating multiple pre-training tasks, though it is incremental as it builds on existing masked autoencoder and multi-task learning paradigms.

The paper tackles the challenge of integrating diverse pre-training tasks for dense retrieval by proposing MASTER, a multi-task pre-trained bottlenecked masked autoencoder that unifies three types of tasks, resulting in outperforming competitive dense retrieval methods in experiments.

Pre-trained Transformers (\eg BERT) have been commonly used in existing dense retrieval methods for parameter initialization, and recent studies are exploring more effective pre-training tasks for further improving the quality of dense vectors. Although various novel and effective tasks have been proposed, their different input formats and learning objectives make them hard to be integrated for jointly improving the model performance. In this work, we aim to unify a variety of pre-training tasks into the bottlenecked masked autoencoder manner, and integrate them into a multi-task pre-trained model, namely MASTER. Concretely, MASTER utilizes a shared-encoder multi-decoder architecture that can construct a representation bottleneck to compress the abundant semantic information across tasks into dense vectors. Based on it, we integrate three types of representative pre-training tasks: corrupted passages recovering, related passages recovering and PLMs outputs recovering, to characterize the inner-passage information, inter-passage relations and PLMs knowledge. Extensive experiments have shown that our approach outperforms competitive dense retrieval methods. Our code and data are publicly released in \url{https://github.com/microsoft/SimXNS}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes