Encoder Adaptation of Dense Passage Retrieval for Open-Domain Question Answering
This work addresses a specific bottleneck in domain adaptation for open-domain question answering, providing incremental insights into encoder roles in retrieval systems.
The paper investigates how individual question and passage encoders in dense passage retrieval affect generalization across domains, finding that passage encoders more strongly influence the lower bound of performance while question encoders affect the upper bound, with out-of-domain passage encoders typically reducing accuracy and question encoders sometimes improving it.
One key feature of dense passage retrievers (DPR) is the use of separate question and passage encoder in a bi-encoder design. Previous work on generalization of DPR mainly focus on testing both encoders in tandem on out-of-distribution (OOD) question-answering (QA) tasks, which is also known as domain adaptation. However, it is still unknown how DPR's individual question/passage encoder affects generalization. Specifically, in this paper, we want to know how an in-distribution (IND) question/passage encoder would generalize if paired with an OOD passage/question encoder from another domain. We refer to this challenge as \textit{encoder adaptation}. To answer this question, we inspect different combinations of DPR's question and passage encoder learned from five benchmark QA datasets on both in-domain and out-of-domain questions. We find that the passage encoder has more influence on the lower bound of generalization while the question encoder seems to affect the upper bound in general. For example, applying an OOD passage encoder usually hurts the retrieval accuracy while an OOD question encoder sometimes even improves the accuracy.