Heuristic Stopping Rules For Technology-Assisted Review
This work addresses a specific cost-efficiency challenge in legal and information retrieval domains, presenting an incremental improvement over existing heuristics.
The paper tackles the problem of determining when to stop reviewing documents in technology-assisted review workflows to meet recall targets while minimizing costs, and finds that the proposed Quant and QuantCI rules accurately hit recall targets and substantially reduce review costs.
Technology-assisted review (TAR) refers to human-in-the-loop active learning workflows for finding relevant documents in large collections. These workflows often must meet a target for the proportion of relevant documents found (i.e. recall) while also holding down costs. A variety of heuristic stopping rules have been suggested for striking this tradeoff in particular settings, but none have been tested against a range of recall targets and tasks. We propose two new heuristic stopping rules, Quant and QuantCI based on model-based estimation techniques from survey research. We compare them against a range of proposed heuristics and find they are accurate at hitting a range of recall targets while substantially reducing review costs.