CVAIDec 20, 2020

Can Everybody Sign Now? Exploring Sign Language Video Generation from 2D Poses

arXiv:2012.10941v239 citations
AI Analysis

This work identifies a critical limitation in current video generation models for creating realistic sign language content, which is important for the deaf community and accessibility technologies.

This paper explores the generation of sign language videos from 2D pose skeletons using state-of-the-art deep learning models for motion transfer. The evaluation on the How2Sign dataset shows that current models are insufficient for generating adequate sign language videos due to a lack of detail in hand movements.

Recent work have addressed the generation of human poses represented by 2D/3D coordinates of human joints for sign language. We use the state of the art in Deep Learning for motion transfer and evaluate them on How2Sign, an American Sign Language dataset, to generate videos of signers performing sign language given a 2D pose skeleton. We evaluate the generated videos quantitatively and qualitatively showing that the current models are not enough to generated adequate videos for Sign Language due to lack of detail in hands.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes