EASL: Multi-Emotion Guided Semantic Disentanglement for Expressive Sign Language Generation
This work addresses the need for more natural and expressive sign language generation for the Deaf community, representing an incremental improvement over existing methods.
The paper tackles the problem of generating sign language videos that lack emotional expressiveness by proposing EASL, an emotion-aware architecture that disentangles semantic and affective features, resulting in superior pose accuracy compared to baselines.
Large language models have revolutionized sign language generation by automatically transforming text into high-quality sign language videos, providing accessible communication for the Deaf community. However, existing LLM-based approaches prioritize semantic accuracy while overlooking emotional expressions, resulting in outputs that lack naturalness and expressiveness. We propose EASL (Emotion-Aware Sign Language), a multi-emotion-guided generation architecture for fine-grained emotional integration. We introduce emotion-semantic disentanglement modules with progressive training to separately extract semantic and affective features. During pose decoding, the emotional representations guide semantic interaction to generate sign poses with 7-class emotion confidence scores, enabling emotional expression recognition. Experimental results demonstrate that EASL achieves pose accuracy superior to all compared baselines by integrating multi-emotion information and effectively adapts to diffusion models to generate expressive sign language videos.