CVJul 2, 2025

Exploring Pose-based Sign Language Translation: Ablation Studies and Attention Insights

Tomas Zelezny, Jakub Straka, Vaclav Javorek, Ondrej Valach, Marek Hruz, Ivan Gruber

arXiv:2507.01532v18.43 citationsh-index: 4Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses sign language translation for accessibility applications, but is incremental as it focuses on optimizing existing preprocessing methods.

This paper investigates how pose-based data preprocessing techniques (normalization, interpolation, and augmentation) affect sign language translation performance, finding they significantly improve model robustness and generalization on YouTubeASL and How2Sign datasets.

Sign Language Translation (SLT) has evolved significantly, moving from isolated recognition approaches to complex, continuous gloss-free translation systems. This paper explores the impact of pose-based data preprocessing techniques - normalization, interpolation, and augmentation - on SLT performance. We employ a transformer-based architecture, adapting a modified T5 encoder-decoder model to process pose representations. Through extensive ablation studies on YouTubeASL and How2Sign datasets, we analyze how different preprocessing strategies affect translation accuracy. Our results demonstrate that appropriate normalization, interpolation, and augmentation techniques can significantly improve model robustness and generalization abilities. Additionally, we provide a deep analysis of the model's attentions and reveal interesting behavior suggesting that adding a dedicated register token can improve overall model performance. We publish our code on our GitHub repository, including the preprocessed YouTubeASL data.

View on arXiv PDF

Similar