IRLGJan 10, 2022

Continual Learning of Long Topic Sequences in Neural Information Retrieval

arXiv:2201.03356v17 citations
AI Analysis

This addresses the challenge of maintaining performance in IR systems as user interests evolve, but it is incremental as it analyzes existing models rather than proposing new solutions.

The paper tackles the problem of catastrophic forgetting in neural information retrieval models when learning long streams of topics over time, and finds that forgetting occurs in specific cases such as task similarity and text length.

In information retrieval (IR) systems, trends and users' interests may change over time, altering either the distribution of requests or contents to be recommended. Since neural ranking approaches heavily depend on the training data, it is crucial to understand the transfer capacity of recent IR approaches to address new domains in the long term. In this paper, we first propose a dataset based upon the MSMarco corpus aiming at modeling a long stream of topics as well as IR property-driven controlled settings. We then in-depth analyze the ability of recent neural IR models while continually learning those streams. Our empirical study highlights in which particular cases catastrophic forgetting occurs (e.g., level of similarity between tasks, peculiarities on text length, and ways of learning models) to provide future directions in terms of model design.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes