Generalizable Engagement Estimation in Conversation via Domain Prompting and Parallel Attention
This addresses the need for robust engagement estimation in adaptive human-computer interaction systems, though it appears incremental as it builds on existing attention and adaptation techniques.
The paper tackles the problem of poor generalizability in conversational engagement estimation by proposing DAPA, a framework that introduces domain prompting and parallel attention mechanisms, achieving state-of-the-art performance with a 0.45 CCC improvement on the NoXi-J test set and winning first place in a challenge.
Accurate engagement estimation is essential for adaptive human-computer interaction systems, yet robust deployment is hindered by poor generalizability across diverse domains and challenges in modeling complex interaction dynamics.To tackle these issues, we propose DAPA (Domain-Adaptive Parallel Attention), a novel framework for generalizable conversational engagement modeling. DAPA introduces a Domain Prompting mechanism by prepending learnable domain-specific vectors to the input, explicitly conditioning the model on the data's origin to facilitate domain-aware adaptation while preserving generalizable engagement representations. To capture interactional synchrony, the framework also incorporates a Parallel Cross-Attention module that explicitly aligns reactive (forward BiLSTM) and anticipatory (backward BiLSTM) states between participants.Extensive experiments demonstrate that DAPA establishes a new state-of-the-art performance on several cross-cultural and cross-linguistic benchmarks, notably achieving an absolute improvement of 0.45 in Concordance Correlation Coefficient (CCC) over a strong baseline on the NoXi-J test set. The superiority of our method was also confirmed by winning the first place in the Multi-Domain Engagement Estimation Challenge at MultiMediate'25.