YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel Corpus
This addresses the problem of limited data for sign language translation, benefiting researchers and developers in accessibility and NLP, though it is incremental as it primarily scales up existing data collection methods.
The paper tackles the data bottleneck in sign language machine learning by introducing YouTube-ASL, a large-scale, open-domain corpus of ASL videos with English captions, which is ~3x larger and has ~10x more unique signers than prior datasets, and achieves a new finetuned state of the art of 12.39 BLEU on How2Sign.
Machine learning for sign languages is bottlenecked by data. In this paper, we present YouTube-ASL, a large-scale, open-domain corpus of American Sign Language (ASL) videos and accompanying English captions drawn from YouTube. With ~1000 hours of videos and >2500 unique signers, YouTube-ASL is ~3x as large and has ~10x as many unique signers as the largest prior ASL dataset. We train baseline models for ASL to English translation on YouTube-ASL and evaluate them on How2Sign, where we achieve a new finetuned state of the art of 12.39 BLEU and, for the first time, report zero-shot results.