CLCVAug 20, 2025

MultiStream-LLM: Bridging Modalities for Robust Sign Language Translation

arXiv:2509.00030v2h-index: 3
Originality Highly original
AI Analysis

This solves the problem of accurate sign language translation for deaf and hard-of-hearing communities, representing a strong specific gain rather than an incremental improvement.

The paper tackles the problem of robust sign language translation by addressing failures in monolithic models on fingerspelling and non-manual cues, introducing MultiStream-LLM which achieves a BLEU-4 score of 23.5 on How2Sign and 73.2% letter accuracy on ChicagoFSWildPlus.

Despite progress in gloss-free Sign Language Translation (SLT), monolithic end-to-end models consistently fail on two critical components of natural signing: the precise recognition of high-speed fingerspelling and the integration of asynchronous non-manual cues from the face. Recent progress in Automated Sign Language Translation with Large Language Models has side stepped this challenge, forcing a single network to learn these simultaneously resulting in poor performance when tasked with translating crucial information such as names,places, and technical terms. We introduce MultiStream-LLM, a modular framework designed to overcome these limitations. Our approach employs separate, specialized predictors for continuous signing, fingerspelling, and lipreading. Each expert network first decodes its specific modality into a sequence of tokens. These parallel streams are then fused by a lightweight transformer that resolves temporal misalignments before passing the combined representation to a Large Language Model (LLM) for final sentence generation. Our method establishes a new state-of-the-art on the How2Sign benchmark with a BLEU-4 score of 23.5 and achieves 73.2% letter accuracy on the challenging ChicagoFSWildPlus fingerspelling dataset. These results validate our core hypothesis: by isolating and solving distinct recogni tion tasks before fusion, our multi-expert approach provides a more powerful and effective pathway to robust, high-fidelity sign language translation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes