CLCVOct 22, 2025

Spatio-temporal Sign Language Representation and Translation

arXiv:2510.19413v1291 citationsh-index: 41WMT
Originality Incremental advance
AI Analysis

This work addresses sign language translation for accessibility, but it is incremental as it builds on existing seq2seq architectures with a focus on spatio-temporal features.

The paper tackled sign language translation from Swiss German Sign Language video to German text by developing a single model that learns spatio-temporal feature representations, aiming to improve generalization over standard methods that often neglect temporal features; the system achieved 5±1 BLEU points on development data but dropped to 0.11±0.06 BLEU points on test data.

This paper describes the DFKI-MLT submission to the WMT-SLT 2022 sign language translation (SLT) task from Swiss German Sign Language (video) into German (text). State-of-the-art techniques for SLT use a generic seq2seq architecture with customized input embeddings. Instead of word embeddings as used in textual machine translation, SLT systems use features extracted from video frames. Standard approaches often do not benefit from temporal features. In our participation, we present a system that learns spatio-temporal feature representations and translation in a single model, resulting in a real end-to-end architecture expected to better generalize to new data sets. Our best system achieved $5\pm1$ BLEU points on the development set, but the performance on the test dropped to $0.11\pm0.06$ BLEU points.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes