AutoNLU: Detecting, root-causing, and fixing NLU model errors
This work addresses the cumbersome process of improving NLU models for production use, particularly in task-oriented semantic parsing, by scaling quality improvement through automation.
The authors tackled the problem of improving NLU model quality in production by developing AutoNLU, a system that automates error detection, attribution, and correction, resulting in detecting four times more failed tasks than random sampling and auto-correcting 65% of identified bugs.
Improving the quality of Natural Language Understanding (NLU) models, and more specifically, task-oriented semantic parsing models, in production is a cumbersome task. In this work, we present a system called AutoNLU, which we designed to scale the NLU quality improvement process. It adds automation to three key steps: detection, attribution, and correction of model errors, i.e., bugs. We detected four times more failed tasks than with random sampling, finding that even a simple active learning sampling method on an uncalibrated model is surprisingly effective for this purpose. The AutoNLU tool empowered linguists to fix ten times more semantic parsing bugs than with prior manual processes, auto-correcting 65% of all identified bugs.