An Open-Source Gloss-Based Baseline for Spoken to Signed Language Translation
This provides a standardized baseline for researchers in sign language translation to compare methods, though it is incremental as it builds on existing pipeline approaches.
The authors tackled the problem of comparing sign language translation methods by presenting an open-source baseline pipeline that converts spoken language to signed language through gloss-based intermediate representations, demonstrating conversion for three language pairs with specific components for text-to-gloss translation and gloss-to-pose conversion.
Sign language translation systems are complex and require many components. As a result, it is very hard to compare methods across publications. We present an open-source implementation of a text-to-gloss-to-pose-to-video pipeline approach, demonstrating conversion from German to Swiss German Sign Language, French to French Sign Language of Switzerland, and Italian to Italian Sign Language of Switzerland. We propose three different components for the text-to-gloss translation: a lemmatizer, a rule-based word reordering and dropping component, and a neural machine translation system. Gloss-to-pose conversion occurs using data from a lexicon for three different signed languages, with skeletal poses extracted from videos. To generate a sentence, the text-to-gloss system is first run, and the pose representations of the resulting signs are stitched together.