CLAIHCLGSep 9, 2024

Multilingual Dyadic Interaction Corpus NoXi+J: Toward Understanding Asian-European Non-verbal Cultural Characteristics and their Influences on Engagement

arXiv:2409.13726v18 citationsh-index: 10
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of cross-cultural engagement recognition in human-computer interaction, though it is incremental as it builds on existing datasets and methods.

The researchers tackled the problem of understanding cultural differences in non-verbal behaviors during conversations by expanding the NoXi dataset to include Japanese and Chinese dyadic conversations (creating NoXi+J), extracting multimodal features, and analyzing their correlation with engagement. They found that cultural characteristics significantly influenced the importance of input features in LSTM models for engagement prediction across five languages, with SHAP analysis confirming these correlations.

Non-verbal behavior is a central challenge in understanding the dynamics of a conversation and the affective states between interlocutors arising from the interaction. Although psychological research has demonstrated that non-verbal behaviors vary across cultures, limited computational analysis has been conducted to clarify these differences and assess their impact on engagement recognition. To gain a greater understanding of engagement and non-verbal behaviors among a wide range of cultures and language spheres, in this study we conduct a multilingual computational analysis of non-verbal features and investigate their role in engagement and engagement prediction. To achieve this goal, we first expanded the NoXi dataset, which contains interaction data from participants living in France, Germany, and the United Kingdom, by collecting session data of dyadic conversations in Japanese and Chinese, resulting in the enhanced dataset NoXi+J. Next, we extracted multimodal non-verbal features, including speech acoustics, facial expressions, backchanneling and gestures, via various pattern recognition techniques and algorithms. Then, we conducted a statistical analysis of listening behaviors and backchannel patterns to identify culturally dependent and independent features in each language and common features among multiple languages. These features were also correlated with the engagement shown by the interlocutors. Finally, we analyzed the influence of cultural differences in the input features of LSTM models trained to predict engagement for five language datasets. A SHAP analysis combined with transfer learning confirmed a considerable correlation between the importance of input features for a language set and the significant cultural characteristics analyzed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes