CLFeb 4, 2025

Spatio-temporal transformer to support automatic sign language translation

arXiv:2502.02587v1
Originality Incremental advance
AI Analysis

This work supports hearing-impaired people communication by improving sign language translation systems, though it is incremental as it builds on existing Transformer methods.

The paper tackles the challenge of automatic sign language translation by addressing gesture variability and long sequence translations, introducing a Transformer-based architecture that encodes spatio-temporal motion gestures. It achieves a BLEU4 score of 46.84% on the Colombian Sign Language Translation Dataset and 30.77% on the RWTH-PHOENIX-Weather-2014T dataset, outperforming baseline approaches.

Sign Language Translation (SLT) systems support hearing-impaired people communication by finding equivalences between signed and spoken languages. This task is however challenging due to multiple sign variations, complexity in language and inherent richness of expressions. Computational approaches have evidenced capabilities to support SLT. Nonetheless, these approaches remain limited to cover gestures variability and support long sequence translations. This paper introduces a Transformer-based architecture that encodes spatio-temporal motion gestures, preserving both local and long-range spatial information through the use of multiple convolutional and attention mechanisms. The proposed approach was validated on the Colombian Sign Language Translation Dataset (CoL-SLTD) outperforming baseline approaches, and achieving a BLEU4 of 46.84%. Additionally, the proposed approach was validated on the RWTH-PHOENIX-Weather-2014T (PHOENIX14T), achieving a BLEU4 score of 30.77%, demonstrating its robustness and effectiveness in handling real-world variations

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes