HCAIJun 17, 2025

Human-Centered Editable Speech-to-Sign-Language Generation via Streaming Conformer-Transformer and Resampling Hook

arXiv:2506.14677v21 citations
Originality Highly original
AI Analysis

This work addresses the need for more natural, editable, and user-adaptive sign-language technologies for deaf users and interpreters, with incremental improvements in model design and user interaction.

The paper tackled the problem of low naturalness, limited expressivity, and lack of user control in sign-language animation systems by proposing a human-centered, real-time speech-to-sign framework, resulting in a 13 ms average frame-inference time, +13 point SUS improvement, and 6.7 point reduction in cognitive load in studies with deaf signers and interpreters.

Existing end-to-end sign-language animation systems suffer from low naturalness, limited facial/body expressivity, and no user control. We propose a human-centered, real-time speech-to-sign animation framework that integrates (1) a streaming Conformer encoder with an autoregressive Transformer-MDN decoder for synchronized upper-body and facial motion generation, (2) a transparent, editable JSON intermediate representation empowering deaf users and experts to inspect and modify each sign segment, and (3) a human-in-the-loop optimization loop that refines the model based on user edits and ratings. Deployed on Unity3D, our system achieves a 13 ms average frame-inference time and a 103 ms end-to-end latency on an RTX 4070. Our key contributions include the design of a JSON-centric editing mechanism for fine-grained sign-level personalization and the first application of an MDN-based feedback loop for continuous model adaptation. This combination establishes a generalizable, explainable AI paradigm for user-adaptive, low-latency multimodal systems. In studies with 20 deaf signers and 5 professional interpreters, we observe a +13 point SUS improvement, 6.7 point reduction in cognitive load, and significant gains in naturalness and trust (p $<$ .001) over baselines. This work establishes a scalable, explainable AI paradigm for accessible sign-language technologies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes