Real-Time Sign Language Detection using Human Pose Estimation
This addresses the problem of accessibility in videoconferencing for deaf or hard-of-hearing users, though it appears incremental as it builds on existing pose estimation and classification techniques.
The paper tackles real-time sign language detection for videoconferencing by extracting optical flow features from human pose estimation, achieving 80% accuracy with a linear classifier and improving to 91% with a recurrent model while maintaining under 4ms processing time.
We propose a lightweight real-time sign language detection model, as we identify the need for such a case in videoconferencing. We extract optical flow features based on human pose estimation and, using a linear classifier, show these features are meaningful with an accuracy of 80%, evaluated on the DGS Corpus. Using a recurrent model directly on the input, we see improvements of up to 91% accuracy, while still working under 4ms. We describe a demo application to sign language detection in the browser in order to demonstrate its usage possibility in videoconferencing applications.