CLMay 22

Emotion Recognition in Sign Language Conversation

Yusong Wang, Keyu Mao, Takao Obi, Minghao Shao, Kotaro Funakoshi

arXiv:2605.2332870.0

AI Analysis

For researchers in affective computing and sign language processing, this work identifies a domain gap and provides a new dataset, but the findings are preliminary and incremental.

The paper introduces emotion recognition in conversation (ERC) to sign language, proposing the eJSL Dialog dataset with 1,920 video samples in 480 dialogues. Benchmarking shows generic multimodal ERC models underperform on sign language, highlighting the need for context-aware visual extractors and larger datasets.

Emotion Recognition in Conversation is a core component of affective computing, while current resources of sign language emotion datasets primarily focus on isolated sentences and lack conversational context. Models trained exclusively on these isolated utterances demonstrate degraded performance in real world scenarios because they cannot utilize historical dialogue flow. To address this structural limitation, we introduce the ERC task to sign language video analysis and propose the eJSL Dialog dataset. Constructed using the scripts from the STUDIES corpus, the dataset contains 1,920 video samples organized into 480 unique dialogues. We conduct systematic benchmarking on this dataset using models ranging from isolated visual networks to multimodal conversational architectures. The results reveal a domain gap when applying generic multimodal conversational emotion recognition models to sign language. These findings demonstrate the explicit need for context aware visual extractors specific to sign language and indicate that expanding the scale of conversational datasets to support large scale pre-training is a necessary next step for future research.

View on arXiv PDF

Similar