Fingerspelling within Sign Language Translation
This addresses a specific challenge in sign language processing for improving translation accuracy, but it is incremental as it builds on existing methods and datasets.
The paper tackled the problem of fingerspelling in sign language translation by evaluating and improving its understanding within sentences, finding that using a character-level tokenization model (ByT5) substantially improved translation quality, while mixing fingerspelling recognition data had mixed effects.
Fingerspelling poses challenges for sign language processing due to its high-frequency motion and use for open-vocabulary terms. While prior work has studied fingerspelling recognition, there has been little attention to evaluating how well sign language translation models understand fingerspelling in the context of entire sentences -- and improving this capability. We manually annotate instances of fingerspelling within FLEURS-ASL and use them to evaluate the effect of two simple measures to improve fingerspelling recognition within American Sign Language to English translation: 1) use a model family (ByT5) with character- rather than subword-level tokenization, and 2) mix fingerspelling recognition data into the translation training mixture. We find that 1) substantially improves understanding of fingerspelling (and therefore translation quality overall), but the effect of 2) is mixed.