CVApr 19, 2018

Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition

arXiv:1804.07187v2124 citations
Originality Incremental advance
AI Analysis

This addresses the problem of capturing long-term temporal relations in hand gesture recognition for video analysis, with incremental improvements over existing methods.

The paper tackles hand gesture recognition by proposing Motion Fused Frames (MFFs), a data-level fusion strategy that integrates motion information into static images to better represent spatio-temporal states, achieving competitive or state-of-the-art accuracies of 96.28%, 57.4%, and 84.7% on three datasets.

Acquiring spatio-temporal states of an action is the most crucial step for action classification. In this paper, we propose a data level fusion strategy, Motion Fused Frames (MFFs), designed to fuse motion information into static images as better representatives of spatio-temporal states of an action. MFFs can be used as input to any deep learning architecture with very little modification on the network. We evaluate MFFs on hand gesture recognition tasks using three video datasets - Jester, ChaLearn LAP IsoGD and NVIDIA Dynamic Hand Gesture Datasets - which require capturing long-term temporal relations of hand movements. Our approach obtains very competitive performance on Jester and ChaLearn benchmarks with the classification accuracies of 96.28% and 57.4%, respectively, while achieving state-of-the-art performance with 84.7% accuracy on NVIDIA benchmark.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes