CVDec 1, 2020

Pose-based Sign Language Recognition using GCN and BERT

arXiv:2012.00781v1106 citations
AI Analysis

This research provides an incremental improvement in sign language recognition accuracy, which can help bridge communication gaps for the hearing and vocally impaired community.

This paper addresses word-level sign language recognition (WSLR) by proposing a novel pose-based approach that separately captures spatial and temporal information. The model achieves a significant improvement in prediction accuracy by up to 5% over state-of-the-art pose-based methods on the WLASL dataset.

Sign language recognition (SLR) plays a crucial role in bridging the communication gap between the hearing and vocally impaired community and the rest of the society. Word-level sign language recognition (WSLR) is the first important step towards understanding and interpreting sign language. However, recognizing signs from videos is a challenging task as the meaning of a word depends on a combination of subtle body motions, hand configurations, and other movements. Recent pose-based architectures for WSLR either model both the spatial and temporal dependencies among the poses in different frames simultaneously or only model the temporal information without fully utilizing the spatial information. We tackle the problem of WSLR using a novel pose-based approach, which captures spatial and temporal information separately and performs late fusion. Our proposed architecture explicitly captures the spatial interactions in the video using a Graph Convolutional Network (GCN). The temporal dependencies between the frames are captured using Bidirectional Encoder Representations from Transformers (BERT). Experimental results on WLASL, a standard word-level sign language recognition dataset show that our model significantly outperforms the state-of-the-art on pose-based methods by achieving an improvement in the prediction accuracy by up to 5%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes