CVMay 6, 2021

Pose-Guided Sign Language Video GAN with Dynamic Lambda

arXiv:2105.02742v110 citations
Originality Incremental advance
AI Analysis

This work addresses sign language video synthesis, which could aid communication for deaf and hard-of-hearing communities, but appears incremental as it builds directly on prior methods.

The paper tackles the problem of synthesizing sign language videos by extending previous GAN-based methods with pose guidance and a periodic weighting approach, achieving a SSIM of 0.893 on the MS-ASL dataset with over 200 signers.

We propose a novel approach for the synthesis of sign language videos using GANs. We extend the previous work of Stoll et al. by using the human semantic parser of the Soft-Gated Warping-GAN from to produce photorealistic videos guided by region-level spatial layouts. Synthesizing target poses improves performance on independent and contrasting signers. Therefore, we have evaluated our system with the highly heterogeneous MS-ASL dataset with over 200 signers resulting in a SSIM of 0.893. Furthermore, we introduce a periodic weighting approach to the generator that reactivates the training and leads to quantitatively better results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes