CVCLFeb 18

Gloss-Free Sign Language Translation: An Unbiased Evaluation of Progress in the Field

arXiv:2603.132403 citationsh-index: 4Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of unclear progress in sign language translation research for the field by providing an unbiased evaluation that reveals incremental contributions.

The paper re-implemented recent gloss-free sign language translation models in a unified codebase to assess the sources of performance improvements, finding that many reported gains diminish under consistent evaluation conditions, highlighting the impact of implementation details and evaluation setups.

Sign Language Translation (SLT) aims to automatically convert visual sign language videos into spoken language text and vice versa. While recent years have seen rapid progress, the true sources of performance improvements often remain unclear. Do reported performance gains come from methodological novelty, or from the choice of a different backbone, training optimizations, hyperparameter tuning, or even differences in the calculation of evaluation metrics? This paper presents a comprehensive study of recent gloss-free SLT models by re-implementing key contributions in a unified codebase. We ensure fair comparison by standardizing preprocessing, video encoders, and training setups across all methods. Our analysis shows that many of the performance gains reported in the literature often diminish when models are evaluated under consistent conditions, suggesting that implementation details and evaluation setups play a significant role in determining results. We make the codebase publicly available here (https://github.com/ozgemercanoglu/sltbaselines) to support transparency and reproducibility in SLT research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes