Autoregressive Sign Language Production: A Gloss-Free Approach with Discrete Representations
This work addresses sign language production for accessibility applications, representing an incremental improvement with a novel method for a known bottleneck.
The paper tackles the problem of generating sign language directly from spoken language without gloss intermediaries by introducing a Vector Quantization Network that uses discrete representations from sign pose sequences, achieving superior performance over prior methods.
Gloss-free Sign Language Production (SLP) offers a direct translation of spoken language sentences into sign language, bypassing the need for gloss intermediaries. This paper presents the Sign language Vector Quantization Network, a novel approach to SLP that leverages Vector Quantization to derive discrete representations from sign pose sequences. Our method, rooted in both manual and non-manual elements of signing, supports advanced decoding methods and integrates latent-level alignment for enhanced linguistic coherence. Through comprehensive evaluations, we demonstrate superior performance of our method over prior SLP methods and highlight the reliability of Back-Translation and Fréchet Gesture Distance as evaluation metrics.