CLOct 22, 2025

SONAR-SLT: Multilingual Sign Language Translation via Language-Agnostic Sentence Embedding Supervision

arXiv:2510.19398v13 citationsh-index: 41Proceedings of the Tenth Conference on Machine Translation
Originality Incremental advance
AI Analysis

This work addresses scalability and generalization issues in sign language translation for multilingual applications, though it is incremental as it builds on earlier embedding-based approaches.

The paper tackled the problem of limited scalability and cross-language generalization in sign language translation (SLT) by using language-agnostic, multimodal embeddings for supervision, enabling direct multilingual translation. The result showed consistent BLEURT gains over text-only methods, with larger improvements in low-resource settings.

Sign language translation (SLT) is typically trained with text in a single spoken language, which limits scalability and cross-language generalization. Earlier approaches have replaced gloss supervision with text-based sentence embeddings, but up to now, these remain tied to a specific language and modality. In contrast, here we employ language-agnostic, multimodal embeddings trained on text and speech from multiple languages to supervise SLT, enabling direct multilingual translation. To address data scarcity, we propose a coupled augmentation method that combines multilingual target augmentations (i.e. translations into many languages) with video-level perturbations, improving model robustness. Experiments show consistent BLEURT gains over text-only sentence embedding supervision, with larger improvements in low-resource settings. Our results demonstrate that language-agnostic embedding supervision, combined with coupled augmentation, provides a scalable and semantically robust alternative to traditional SLT training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes