Pose-Guided Sign Language Video GAN with Dynamic Lambda
This work addresses sign language video synthesis, which could aid communication for deaf and hard-of-hearing communities, but appears incremental as it builds directly on prior methods.
The paper tackles the problem of synthesizing sign language videos by extending previous GAN-based methods with pose guidance and a periodic weighting approach, achieving a SSIM of 0.893 on the MS-ASL dataset with over 200 signers.
We propose a novel approach for the synthesis of sign language videos using GANs. We extend the previous work of Stoll et al. by using the human semantic parser of the Soft-Gated Warping-GAN from to produce photorealistic videos guided by region-level spatial layouts. Synthesizing target poses improves performance on independent and contrasting signers. Therefore, we have evaluated our system with the highly heterogeneous MS-ASL dataset with over 200 signers resulting in a SSIM of 0.893. Furthermore, we introduce a periodic weighting approach to the generator that reactivates the training and leads to quantitatively better results.