CVJan 29, 2024

Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a Large Foundational Video Understanding Model

Till Grutschus, Ola Karrar, Emir Esenov, Ekta Vats

arXiv:2401.16280v12.0h-index: 6

Originality Incremental advance

AI Analysis

This work addresses fall detection for elderly care or healthcare monitoring, but it is incremental as it adapts existing models with simple preprocessing techniques.

The paper tackled human fall detection in untrimmed videos by applying a large foundational video understanding model with a cutup-based temporal action localization method, achieving a state-of-the-art F1 score of 0.96 on the HQFSD dataset.

This work explores the performance of a large video understanding foundation model on the downstream task of human fall detection on untrimmed video and leverages a pretrained vision transformer for multi-class action detection, with classes: "Fall", "Lying" and "Other/Activities of daily living (ADL)". A method for temporal action localization that relies on a simple cutup of untrimmed videos is demonstrated. The methodology includes a preprocessing pipeline that converts datasets with timestamp action annotations into labeled datasets of short action clips. Simple and effective clip-sampling strategies are introduced. The effectiveness of the proposed method has been empirically evaluated on the publicly available High-Quality Fall Simulation Dataset (HQFSD). The experimental results validate the performance of the proposed pipeline. The results are promising for real-time application, and the falls are detected on video level with a state-of-the-art 0.96 F1 score on the HQFSD dataset under the given experimental settings. The source code will be made available on GitHub.

View on arXiv PDF

Similar