CVCLAug 11, 2020

Real-Time Sign Language Detection using Human Pose Estimation

arXiv:2008.04637v283 citations
AI Analysis

This addresses the problem of accessibility in videoconferencing for deaf or hard-of-hearing users, though it appears incremental as it builds on existing pose estimation and classification techniques.

The paper tackles real-time sign language detection for videoconferencing by extracting optical flow features from human pose estimation, achieving 80% accuracy with a linear classifier and improving to 91% with a recurrent model while maintaining under 4ms processing time.

We propose a lightweight real-time sign language detection model, as we identify the need for such a case in videoconferencing. We extract optical flow features based on human pose estimation and, using a linear classifier, show these features are meaningful with an accuracy of 80%, evaluated on the DGS Corpus. Using a recurrent model directly on the input, we see improvements of up to 91% accuracy, while still working under 4ms. We describe a demo application to sign language detection in the browser in order to demonstrate its usage possibility in videoconferencing applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes