CLAIAug 25, 2025

Why Synthetic Isn't Real Yet: A Diagnostic Framework for Contact Center Dialogue Generation

arXiv:2508.18210v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses data scarcity and privacy issues in contact center domains, but it is incremental as it builds on existing synthetic dialogue generation work with a domain-specific focus.

The paper tackles the problem of generating realistic synthetic contact center dialogues by leveraging derived call attributes as supervision and introduces a diagnostic framework with 18 metrics to evaluate quality. Results show that no generation method excels across all traits, with deficits in disfluency, sentiment, and behavioral realism.

Synthetic transcript generation is critical in contact center domains, where privacy and data scarcity limit model training and evaluation. Unlike prior synthetic dialogue generation work on open-domain or medical dialogues, contact center conversations are goal-oriented, role-asymmetric, and behaviorally complex, featuring disfluencies, ASR noise, and compliance-driven agent actions. In deployments where transcripts are unavailable, standard pipelines still yield derived call attributes such as Intent Summaries, Topic Flow, and QA Evaluation Forms. We leverage these as supervision signals to guide generation. To assess the quality of such outputs, we introduce a diagnostic framework of 18 linguistically and behaviorally grounded metrics for comparing real and synthetic transcripts. We benchmark four language-agnostic generation strategies, from simple prompting to characteristic-aware multi-stage approaches, alongside reference-free baselines. Results reveal persistent challenges: no method excels across all traits, with notable deficits in disfluency, sentiment, and behavioral realism. Our diagnostic tool exposes these gaps, enabling fine-grained evaluation and stress testing of synthetic dialogue across languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes