CLJul 13, 2023

Does Collaborative Human-LM Dialogue Generation Help Information Extraction from Human Dialogues?

AI2MicrosoftUW
arXiv:2307.07047v23 citationsh-index: 114
AI Analysis

This work addresses the challenge of data scarcity for information extraction in private domains like call centers, though it is incremental as it builds on existing dialogue generation methods.

The authors tackled the problem of limited private data for information extraction from complex human dialogues by introducing a human-in-the-loop dialogue generation framework, resulting in a 25% relative improvement in F1 score when augmenting real call center data with synthetic dialogues.

The capabilities of pretrained language models have opened opportunities to explore new application areas, but applications involving human-human interaction are limited by the fact that most data is protected from public release for privacy reasons. Problem-solving human dialogues in real applications can be much more complex than existing Wizard-of-Oz collections, preventing successful domain transfer. To support information extraction (IE) for a private call center dataset, we introduce a human-in-the-loop dialogue generation framework capable of synthesizing realistic dialogues. In IE experiments with auto insurance call center dialogues, we observe 25\% relative improvement in $F_1$ after augmenting a small set of real human conversations with synthetic data. We release code and our synthetic dataset to illustrate the complexity of real-world call center conversations and encourage development of complex dialogue datasets that are more representative of natural data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes