Continuous Active Learning Using Pretrained Transformers
This work addresses the challenge of achieving high recall in information retrieval for applications requiring comprehensive document retrieval, representing an incremental advancement by adapting existing transformer methods to a specific domain bottleneck.
The paper tackled the problem of improving high-recall information retrieval, where the goal is to retrieve nearly all relevant documents, by investigating the use of transformer-based models like BERT and T5 for reranking and featurization, and introduced CALBERT for continuous fine-tuning based on relevance feedback, resulting in improvements over the current state-of-the-art Baseline Model Implementation of the TREC Total Recall Track.
Pre-trained and fine-tuned transformer models like BERT and T5 have improved the state of the art in ad-hoc retrieval and question-answering, but not as yet in high-recall information retrieval, where the objective is to retrieve substantially all relevant documents. We investigate whether the use of transformer-based models for reranking and/or featurization can improve the Baseline Model Implementation of the TREC Total Recall Track, which represents the current state of the art for high-recall information retrieval. We also introduce CALBERT, a model that can be used to continuously fine-tune a BERT-based model based on relevance feedback.