CVDec 29, 2017

Learning Deep and Compact Models for Gesture Recognition

arXiv:1712.10136v19 citations
Originality Incremental advance
AI Analysis

This addresses the need for efficient gesture recognition on mobile devices, though it is incremental as it builds on existing deep learning and compression techniques.

The paper tackles the problem of developing a compact and accurate model for gesture recognition from videos by proposing a joint 3DCNN-LSTM model, achieving close to state-of-the-art accuracy on the ChaLearn dataset with half the model size, and further compressing it to less than 1 MB with a 7% accuracy drop for real-time mobile use.

We look at the problem of developing a compact and accurate model for gesture recognition from videos in a deep-learning framework. Towards this we propose a joint 3DCNN-LSTM model that is end-to-end trainable and is shown to be better suited to capture the dynamic information in actions. The solution achieves close to state-of-the-art accuracy on the ChaLearn dataset, with only half the model size. We also explore ways to derive a much more compact representation in a knowledge distillation framework followed by model compression. The final model is less than $1~MB$ in size, which is less than one hundredth of our initial model, with a drop of $7\%$ in accuracy, and is suitable for real-time gesture recognition on mobile devices.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes