CVJan 16, 2025

Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues

arXiv:2501.09754v222 citationsh-index: 16CVPR
AI Analysis

This addresses the problem of accurate sign language translation for deaf and hard-of-hearing communities, representing an incremental advance by integrating multiple contextual sources.

The paper tackles continuous sign language translation by incorporating contextual cues like captions, previous translations, and pseudo-glosses into a framework using a fine-tuned LLM, achieving significant improvements on the BOBSL dataset and competitive results on How2Sign.

Our objective is to translate continuous sign language into spoken language text. Inspired by the way human interpreters rely on context for accurate translation, we incorporate additional contextual cues together with the signing video, into a new translation framework. Specifically, besides visual sign recognition features that encode the input video, we integrate complementary textual information from (i) captions describing the background show, (ii) translation of previous sentences, as well as (iii) pseudo-glosses transcribing the signing. These are automatically extracted and inputted along with the visual features to a pre-trained large language model (LLM), which we fine-tune to generate spoken language translations in text form. Through extensive ablation studies, we show the positive contribution of each input cue to the translation performance. We train and evaluate our approach on BOBSL -- the largest British Sign Language dataset currently available. We show that our contextual approach significantly enhances the quality of the translations compared to previously reported results on BOBSL, and also to state-of-the-art methods that we implement as baselines. Furthermore, we demonstrate the generality of our approach by applying it also to How2Sign, an American Sign Language dataset, and achieve competitive results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes