CVAug 21, 2025

Diverse Signer Avatars with Manual and Non-Manual Feature Modelling for Sign Language Production

arXiv:2508.15988v1h-index: 5

Originality Incremental advance

AI Analysis

This addresses the need for more realistic and varied sign language production for deaf and hard-of-hearing communities, though it appears incremental as it builds on existing latent diffusion models.

The paper tackled the problem of generating diverse and high-quality sign language avatars by modeling both manual and non-manual features, achieving superior visual quality with significant improvements on perceptual metrics compared to state-of-the-art methods.

The diversity of sign representation is essential for Sign Language Production (SLP) as it captures variations in appearance, facial expressions, and hand movements. However, existing SLP models are often unable to capture diversity while preserving visual quality and modelling non-manual attributes such as emotions. To address this problem, we propose a novel approach that leverages Latent Diffusion Model (LDM) to synthesise photorealistic digital avatars from a generated reference image. We propose a novel sign feature aggregation module that explicitly models the non-manual features (\textit{e.g.}, the face) and the manual features (\textit{e.g.}, the hands). We show that our proposed module ensures the preservation of linguistic content while seamlessly using reference images with different ethnic backgrounds to ensure diversity. Experiments on the YouTube-SL-25 sign language dataset show that our pipeline achieves superior visual quality compared to state-of-the-art methods, with significant improvements on perceptual metrics.

View on arXiv PDF

Similar