CLCVLGFeb 11, 2022

Including Facial Expressions in Contextual Embeddings for Sign Language Generation

arXiv:2202.05383v1222 citations
Originality Incremental advance
AI Analysis

This addresses the problem of unnatural sign language generation for deaf and hard-of-hearing communities, representing an incremental advance.

The paper tackles the lack of expressivity in sign language generation by incorporating facial expressions into contextual embeddings, showing that their proposed Dual Encoder Transformer model improves generation quality.

State-of-the-art sign language generation frameworks lack expressivity and naturalness which is the result of only focusing manual signs, neglecting the affective, grammatical and semantic functions of facial expressions. The purpose of this work is to augment semantic representation of sign language through grounding facial expressions. We study the effect of modeling the relationship between text, gloss, and facial expressions on the performance of the sign generation systems. In particular, we propose a Dual Encoder Transformer able to generate manual signs as well as facial expressions by capturing the similarities and differences found in text and sign gloss annotation. We take into consideration the role of facial muscle activity to express intensities of manual signs by being the first to employ facial action units in sign language generation. We perform a series of experiments showing that our proposed model improves the quality of automatically generated sign language.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes