Is Retriever Merely an Approximator of Reader?
This addresses a fundamental relationship in open-domain QA systems, offering an incremental improvement for enhancing retrieval and overall performance.
The paper investigates whether the retriever in open-domain question answering is merely a fast approximation of the reader, finding they are complementary in accuracy, and proposes distilling the reader into the retriever to improve document recall and QA accuracy.
The state of the art in open-domain question answering (QA) relies on an efficient retriever that drastically reduces the search space for the expensive reader. A rather overlooked question in the community is the relationship between the retriever and the reader, and in particular, if the whole purpose of the retriever is just a fast approximation for the reader. Our empirical evidence indicates that the answer is no, and that the reader and the retriever are complementary to each other even in terms of accuracy only. We make a careful conjecture that the architectural constraint of the retriever, which has been originally intended for enabling approximate search, seems to also make the model more robust in large-scale search. We then propose to distill the reader into the retriever so that the retriever absorbs the strength of the reader while keeping its own benefit. Experimental results show that our method can enhance the document recall rate as well as the end-to-end QA accuracy of off-the-shelf retrievers in open-domain QA tasks.