CVCLNov 26, 2024

Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator

arXiv:2411.17799v38 citationsh-index: 81
Originality Incremental advance
AI Analysis

This work addresses the communication needs of deaf and hard-of-hearing communities by advancing sign language generation, though it is incremental in building on existing pretrained language models.

The paper tackles the under-explored task of text-to-sign language generation by introducing SOKE, a multilingual model that generates 3D sign avatars from text, achieving improved inference efficiency and precision through multi-head decoding and retrieval-enhanced methods.

Sign language is a visual language that encompasses all linguistic features of natural languages and serves as the primary communication method for the deaf and hard-of-hearing communities. Although many studies have successfully adapted pretrained language models (LMs) for sign language translation (sign-to-text), the reverse task-sign language generation (text-to-sign)-remains largely unexplored. In this work, we introduce a multilingual sign language model, Signs as Tokens (SOKE), which can generate 3D sign avatars autoregressively from text inputs using a pretrained LM. To align sign language with the LM, we leverage a decoupled tokenizer that discretizes continuous signs into token sequences representing various body parts. During decoding, unlike existing approaches that flatten all part-wise tokens into a single sequence and predict one token at a time, we propose a multi-head decoding method capable of predicting multiple tokens simultaneously. This approach improves inference efficiency while maintaining effective information fusion across different body parts. To further ease the generation process, we propose a retrieval-enhanced SLG approach, which incorporates external sign dictionaries to provide accurate word-level signs as auxiliary conditions, significantly improving the precision of generated signs. Extensive qualitative and quantitative evaluations demonstrate the effectiveness of SOKE.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes