IROct 12, 2018

Technology Assisted Reviews: Finding the Last Few Relevant Documents by Asking Yes/No Questions to Reviewers

arXiv:1810.05414v125 citations
Originality Incremental advance
AI Analysis

This addresses a bottleneck in document review for legal or compliance domains, offering an incremental improvement over continuous active learning methods.

The paper tackles the problem of efficiently finding the last few relevant documents in technology-assisted reviews, where existing methods plateau at 80-90% recall, by proposing a sequential Bayesian search method that asks yes/no questions about entities, resulting in improved performance with less reviewing effort.

The goal of a technology-assisted review is to achieve high recall with low human effort. Continuous active learning algorithms have demonstrated good performance in locating the majority of relevant documents in a collection, however their performance is reaching a plateau when 80\%-90\% of them has been found. Finding the last few relevant documents typically requires exhaustively reviewing the collection. In this paper, we propose a novel method to identify these last few, but significant, documents efficiently. Our method makes the hypothesis that entities carry vital information in documents, and that reviewers can answer questions about the presence or absence of an entity in the missing relevance documents. Based on this we devise a sequential Bayesian search method that selects the optimal sequence of questions to ask. The experimental results show that our proposed method can greatly improve performance requiring less reviewing effort.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes