CVAug 24, 2020

Global-local Enhancement Network for NMFs-aware Sign Language Recognition

arXiv:2008.10428v274 citations
AI Analysis

This addresses sign language recognition for communication accessibility, but is incremental as it builds on existing SLR methods with a new dataset.

The paper tackles sign language recognition ambiguity caused by non-manual features (e.g., facial expressions) that change meaning despite identical hand gestures, by proposing a Global-local Enhancement Network (GLE-Net) with two streams for global context and fine-grained cues, achieving effectiveness on new and existing datasets.

Sign language recognition (SLR) is a challenging problem, involving complex manual features, i.e., hand gestures, and fine-grained non-manual features (NMFs), i.e., facial expression, mouth shapes, etc. Although manual features are dominant, non-manual features also play an important role in the expression of a sign word. Specifically, many sign words convey different meanings due to non-manual features, even though they share the same hand gestures. This ambiguity introduces great challenges in the recognition of sign words. To tackle the above issue, we propose a simple yet effective architecture called Global-local Enhancement Network (GLE-Net), including two mutually promoted streams towards different crucial aspects of SLR. Of the two streams, one captures the global contextual relationship, while the other stream captures the discriminative fine-grained cues. Moreover, due to the lack of datasets explicitly focusing on this kind of features, we introduce the first non-manual-features-aware isolated Chinese sign language dataset~(NMFs-CSL) with a total vocabulary size of 1,067 sign words in daily life. Extensive experiments on NMFs-CSL and SLR500 datasets demonstrate the effectiveness of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes