CVLGIVSep 24, 2019

Learning deep representations for video-based intake gesture detection

arXiv:1909.10695v140 citations
Originality Incremental advance
AI Analysis

This work addresses dietary monitoring by enabling video-based detection of intake gestures, which is a novel application but incremental in method.

The study tackled the problem of automatically detecting individual intake gestures during eating occasions using video data, achieving an F1 score of 0.858 with deep learning architectures.

Automatic detection of individual intake gestures during eating occasions has the potential to improve dietary monitoring and support dietary recommendations. Existing studies typically make use of on-body solutions such as inertial and audio sensors, while video is used as ground truth. Intake gesture detection directly based on video has rarely been attempted. In this study, we address this gap and show that deep learning architectures can successfully be applied to the problem of video-based detection of intake gestures. For this purpose, we collect and label video data of eating occasions using 360-degree video of 102 participants. Applying state-of-the-art approaches from video action recognition, our results show that (1) the best model achieves an $F_1$ score of 0.858, (2) appearance features contribute more than motion features, and (3) temporal context in form of multiple video frames is essential for top model performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes