Data Augmentation for Sign Language Gloss Translation
This addresses the low-resource challenge in sign language translation for deaf and hearing communities, but it is incremental as it builds on existing decomposition methods with rule-based heuristics.
The paper tackled gloss-to-text translation for sign languages as a low-resource neural machine translation problem, exploiting lexical overlap and syntactic divergence to generate pseudo-parallel data, resulting in improvements of up to 3.14 BLEU for ASL-English and 2.20 BLEU for DGS-German.
Sign language translation (SLT) is often decomposed into video-to-gloss recognition and gloss-to-text translation, where a gloss is a sequence of transcribed spoken-language words in the order in which they are signed. We focus here on gloss-to-text translation, which we treat as a low-resource neural machine translation (NMT) problem. However, unlike traditional low-resource NMT, gloss-to-text translation differs because gloss-text pairs often have a higher lexical overlap and lower syntactic overlap than pairs of spoken languages. We exploit this lexical overlap and handle syntactic divergence by proposing two rule-based heuristics that generate pseudo-parallel gloss-text pairs from monolingual spoken language text. By pre-training on the thus obtained synthetic data, we improve translation from American Sign Language (ASL) to English and German Sign Language (DGS) to German by up to 3.14 and 2.20 BLEU, respectively.