IR AI CL LGJun 18, 2024

PromptDSI: Prompt-based Rehearsal-free Continual Learning for Document Retrieval

Tuan-Luc Huynh, Thuy-Trang Vu, Weiqing Wang, Yinwei Wei, Trung Le, Dragan Gasevic, Yuan-Fang Li, Thanh-Toan Do

arXiv:2406.12593v46.92 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of privacy-restricted data access in continual document retrieval, offering an incremental improvement over existing methods.

The paper tackles the problem of continual learning for document retrieval without needing to access previous data, introducing PromptDSI which uses learnable prompts and a topic-aware prompt pool to efficiently index new documents. The result shows that PromptDSI variants outperform rehearsal-based baselines, match cache-based methods in mitigating forgetting, and significantly improve retrieval performance on new corpora.

Differentiable Search Index (DSI) utilizes pre-trained language models to perform indexing and document retrieval via end-to-end learning without relying on external indexes. However, DSI requires full re-training to index new documents, causing significant computational inefficiencies. Continual learning (CL) offers a solution by enabling the model to incrementally update without full re-training. Existing CL solutions in document retrieval rely on memory buffers or generative models for rehearsal, which is infeasible when accessing previous training data is restricted due to privacy concerns. To this end, we introduce PromptDSI, a prompt-based, rehearsal-free continual learning approach for document retrieval. PromptDSI follows the Prompt-based Continual Learning (PCL) framework, using learnable prompts to efficiently index new documents without accessing previous documents or queries. To improve retrieval latency, we remove the initial forward pass of PCL, which otherwise greatly increases training and inference time, with a negligible trade-off in performance. Additionally, we introduce a novel topic-aware prompt pool that employs neural topic embeddings as fixed keys, eliminating the instability of prompt key optimization while maintaining competitive performance with existing PCL prompt pools. In a challenging rehearsal-free continual learning setup, we demonstrate that PromptDSI variants outperform rehearsal-based baselines, match the strong cache-based baseline in mitigating forgetting, and significantly improving retrieval performance on new corpora.

View on arXiv PDF

Similar