A Survey of Intent Classification and Slot-Filling Datasets for Task-Oriented Dialog
This survey addresses the problem of limited benchmark datasets for researchers and developers in dialog systems, but it is incremental as it compiles existing resources without introducing new methods or data.
The authors conducted a survey of publicly available datasets for intent classification and slot-filling in task-oriented dialog systems, cataloging their characteristics and discussing applicability to promote more robust analyses and increase dataset accessibility.
Interest in dialog systems has grown substantially in the past decade. By extension, so too has interest in developing and improving intent classification and slot-filling models, which are two components that are commonly used in task-oriented dialog systems. Moreover, good evaluation benchmarks are important in helping to compare and analyze systems that incorporate such models. Unfortunately, much of the literature in the field is limited to analysis of relatively few benchmark datasets. In an effort to promote more robust analyses of task-oriented dialog systems, we have conducted a survey of publicly available datasets for the tasks of intent classification and slot-filling. We catalog the important characteristics of each dataset, and offer discussion on the applicability, strengths, and weaknesses of each. Our goal is that this survey aids in increasing the accessibility of these datasets, which we hope will enable their use in future evaluations of intent classification and slot-filling models for task-oriented dialog systems.