CLIRJun 2, 2021

Efficient Passage Retrieval with Hashing for Open-domain Question Answering

arXiv:2106.00882v1728 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses memory constraints for deploying question answering systems, but it is incremental as it builds on Dense Passage Retriever.

The paper tackles the memory inefficiency of neural retrieval models in open-domain question answering by introducing Binary Passage Retriever (BPR), which reduces memory usage from 65GB to 2GB while maintaining accuracy on benchmarks like Natural Questions and TriviaQA.

Most state-of-the-art open-domain question answering systems use a neural retrieval model to encode passages into continuous vectors and extract them from a knowledge source. However, such retrieval models often require large memory to run because of the massive size of their passage index. In this paper, we introduce Binary Passage Retriever (BPR), a memory-efficient neural retrieval model that integrates a learning-to-hash technique into the state-of-the-art Dense Passage Retriever (DPR) to represent the passage index using compact binary codes rather than continuous vectors. BPR is trained with a multi-task objective over two tasks: efficient candidate generation based on binary codes and accurate reranking based on continuous vectors. Compared with DPR, BPR substantially reduces the memory cost from 65GB to 2GB without a loss of accuracy on two standard open-domain question answering benchmarks: Natural Questions and TriviaQA. Our code and trained models are available at https://github.com/studio-ousia/bpr.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes