CL AI LGSep 4, 2019

An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction

Stefan Larson, Anish Mahendran, Joseph J. Peper, Christopher Clarke, Andrew Lee, Parker Hill, Jonathan K. Kummerfeld, Kevin Leach, Michael A. Laurenzano, Lingjia Tang, Jason Mars

arXiv:1909.02027v132.81127 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This addresses a gap for developers of task-oriented dialog systems by providing a more realistic benchmark, though it is incremental as it focuses on dataset creation rather than novel methods.

The authors tackled the problem of task-oriented dialog systems needing to identify out-of-scope queries, which current datasets lack, by introducing a new dataset with 150 intent classes across 10 domains and including out-of-scope examples. They found that benchmark classifiers perform well on in-scope intent classification but struggle with out-of-scope identification.

Task-oriented dialog systems need to know when a query falls outside their range of supported intents, but current text classification corpora only define label sets that cover every example. We introduce a new dataset that includes queries that are out-of-scope---i.e., queries that do not fall into any of the system's supported intents. This poses a new challenge because models cannot assume that every query at inference time belongs to a system-supported intent class. Our dataset also covers 150 intent classes over 10 domains, capturing the breadth that a production task-oriented agent must handle. We evaluate a range of benchmark classifiers on our dataset along with several different out-of-scope identification schemes. We find that while the classifiers perform well on in-scope intent classification, they struggle to identify out-of-scope queries. Our dataset and evaluation fill an important gap in the field, offering a way of more rigorously and realistically benchmarking text classification in task-driven dialog systems.

View on arXiv PDF Code

Similar