CL AIJul 5, 2019

Towards Universal Dialogue Act Tagging for Task-Oriented Dialogues

Shachi Paul, Rahul Goel, Dilek Hakkani-Tür

arXiv:1907.03020v12.013 citations

Originality Incremental advance

AI Analysis

This work addresses the annotation bottleneck for building task-oriented dialogue systems from human-human conversations, offering a practical solution with incremental improvements in tagging accuracy.

The paper tackles the problem of expensive annotation for task-oriented dialogue systems by proposing a Universal Dialogue Act schema and aligning existing datasets to train a tagger, achieving an F1 score of 54.1% in unsupervised and 57.7% in semi-supervised setups on human-human dialogues.

Machine learning approaches for building task-oriented dialogue systems require large conversational datasets with labels to train on. We are interested in building task-oriented dialogue systems from human-human conversations, which may be available in ample amounts in existing customer care center logs or can be collected from crowd workers. Annotating these datasets can be prohibitively expensive. Recently multiple annotated task-oriented human-machine dialogue datasets have been released, however their annotation schema varies across different collections, even for well-defined categories such as dialogue acts (DAs). We propose a Universal DA schema for task-oriented dialogues and align existing annotated datasets with our schema. Our aim is to train a Universal DA tagger (U-DAT) for task-oriented dialogues and use it for tagging human-human conversations. We investigate multiple datasets, propose manual and automated approaches for aligning the different schema, and present results on a target corpus of human-human dialogues. In unsupervised learning experiments we achieve an F1 score of 54.1% on system turns in human-human dialogues. In a semi-supervised setup, the F1 score increases to 57.7% which would otherwise require at least 1.7K manually annotated turns. For new domains, we show further improvements when unlabeled or labeled target domain data is available.

View on arXiv PDF

Similar