Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
This addresses the efficiency problem for researchers and practitioners using large generative models in QA, though it is incremental by building on existing retrieval-augmented methods.
The paper tackles the high computational cost of generative models for open-domain question answering by integrating passage retrieval, achieving state-of-the-art results on Natural Questions and TriviaQA benchmarks, with performance improving as more passages are retrieved.
Generative models for open domain question answering have proven to be competitive, without resorting to external knowledge. While promising, this approach requires to use models with billions of parameters, which are expensive to train and query. In this paper, we investigate how much these models can benefit from retrieving text passages, potentially containing evidence. We obtain state-of-the-art results on the Natural Questions and TriviaQA open benchmarks. Interestingly, we observe that the performance of this method significantly improves when increasing the number of retrieved passages. This is evidence that generative models are good at aggregating and combining evidence from multiple passages.