CLLGMar 15, 2023

Reevaluating Data Partitioning for Emotion Detection in EmoWOZ

arXiv:2303.13364v1h-index: 5
Originality Synthesis-oriented
AI Analysis

This work addresses dataset partitioning issues for emotion detection in conversational AI, but it is incremental as it focuses on improving an existing dataset rather than introducing a new method.

The paper tackles the problem of distributional shift and imbalanced emotion labels in the EmoWOZ dataset, which leads to suboptimal performance in emotion detection models; by proposing a stratified sampling scheme, it improves dataset distribution and enhances model performance, making EmoWOZ a more reliable resource.

This paper focuses on the EmoWoz dataset, an extension of MultiWOZ that provides emotion labels for the dialogues. MultiWOZ was partitioned initially for another purpose, resulting in a distributional shift when considering the new purpose of emotion recognition. The emotion tags in EmoWoz are highly imbalanced and unevenly distributed across the partitions, which causes sub-optimal performance and poor comparison of models. We propose a stratified sampling scheme based on emotion tags to address this issue, improve the dataset's distribution, and reduce dataset shift. We also introduce a special technique to handle conversation (sequential) data with many emotional tags. Using our proposed sampling method, models built upon EmoWoz can perform better, making it a more reliable resource for training conversational agents with emotional intelligence. We recommend that future researchers use this new partitioning to ensure consistent and accurate performance evaluations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes