How Stylistic Similarity Shapes Preferences in Dialogue Dataset with User and Third Party Evaluations
This work addresses the need for better evaluation metrics in dialogue generation by distinguishing subjective and objective similarity, though it is incremental as it builds on prior suggestions about stylistic similarity.
The study tackled the problem of how stylistic similarity influences user preferences in dialogue systems by introducing a dataset with subjective and objective similarity measures, finding a strong positive correlation between subjective similarity and user preference and highlighting a discrepancy between subjective and objective evaluations.
Recent advancements in dialogue generation have broadened the scope of human-bot interactions, enabling not only contextually appropriate responses but also the analysis of human affect and sensitivity. While prior work has suggested that stylistic similarity between user and system may enhance user impressions, the distinction between subjective and objective similarity is often overlooked. To investigate this issue, we introduce a novel dataset that includes users' preferences, subjective stylistic similarity based on users' own perceptions, and objective stylistic similarity annotated by third party evaluators in open-domain dialogue settings. Analysis using the constructed dataset reveals a strong positive correlation between subjective stylistic similarity and user preference. Furthermore, our analysis suggests an important finding: users' subjective stylistic similarity differs from third party objective similarity. This underscores the importance of distinguishing between subjective and objective evaluations and understanding the distinct aspects each captures when analyzing the relationship between stylistic similarity and user preferences. The dataset presented in this paper is available online.