CLAIHCLGMLDec 17, 2015

A Survey of Available Corpora for Building Data-Driven Dialogue Systems

arXiv:1512.05742v3356 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This survey addresses the lack of obvious data-driven progress in dialogue systems by compiling resources for researchers, but it is incremental as it reviews existing datasets without introducing new methods or results.

The authors surveyed publicly available datasets for building data-driven dialogue systems, discussing their characteristics, uses for learning dialogue strategies, and evaluation metrics to facilitate research in this area.

During the past decade, several areas of speech and language understanding have witnessed substantial breakthroughs from the use of data-driven models. In the area of dialogue systems, the trend is less obvious, and most practical systems are still built through significant engineering and expert knowledge. Nevertheless, several recent results suggest that data-driven approaches are feasible and quite promising. To facilitate research in this area, we have carried out a wide survey of publicly available datasets suitable for data-driven learning of dialogue systems. We discuss important characteristics of these datasets, how they can be used to learn diverse dialogue strategies, and their other potential uses. We also examine methods for transfer learning between datasets and the use of external knowledge. Finally, we discuss appropriate choice of evaluation metrics for the learning objective.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes