IR LGJun 18, 2021

Heuristic Stopping Rules For Technology-Assisted Review

Eugene Yang, David D. Lewis, Ophir Frieder

arXiv:2106.09871v124 citations

Originality Incremental advance

AI Analysis

This work addresses a specific cost-efficiency challenge in legal and information retrieval domains, presenting an incremental improvement over existing heuristics.

The paper tackles the problem of determining when to stop reviewing documents in technology-assisted review workflows to meet recall targets while minimizing costs, and finds that the proposed Quant and QuantCI rules accurately hit recall targets and substantially reduce review costs.

Technology-assisted review (TAR) refers to human-in-the-loop active learning workflows for finding relevant documents in large collections. These workflows often must meet a target for the proportion of relevant documents found (i.e. recall) while also holding down costs. A variety of heuristic stopping rules have been suggested for striking this tradeoff in particular settings, but none have been tested against a range of recall targets and tasks. We propose two new heuristic stopping rules, Quant and QuantCI based on model-based estimation techniques from survey research. We compare them against a range of proposed heuristics and find they are accurate at hitting a range of recall targets while substantially reducing review costs.

View on arXiv PDF

Similar