CVAILGOct 12, 2024

Advanced Gesture Recognition for Autism Spectrum Disorder Detection: Integrating YOLOv7, Video Augmentation, and VideoMAE for Naturalistic Video Analysis

arXiv:2410.09339v3h-index: 3
Originality Incremental advance
AI Analysis

This work addresses the need for automated, contactless ASD diagnosis tools, but it is incremental as it builds on existing datasets and methods.

This study tackled the problem of detecting autism spectrum disorder (ASD) in children by analyzing videos of repetitive motor behaviors in natural environments, achieving 95% accuracy and surpassing previous state-of-the-art results.

Deep learning and contactless sensing technologies have significantly advanced the automated assessment of human behaviors in healthcare. In the context of autism spectrum disorder (ASD), repetitive motor behaviors such as spinning, head banging, and arm flapping are key indicators for diagnosis. This study focuses on distinguishing between children with ASD and typically developed (TD) peers by analyzing videos captured in natural, uncontrolled environments. Using the publicly available Self-Stimulatory Behavior Dataset (SSBD), we address the classification task as a binary problem, ASD vs. TD, based on stereotypical repetitive gestures. We adopt a pipeline integrating YOLOv7-based detection, extensive video augmentations, and the VideoMAE framework, which efficiently captures both spatial and temporal features through a high-ratio masking and reconstruction strategy. Our proposed approach achieves 95% accuracy, 0.93 precision, 0.94 recall, and 0.94 F1 score, surpassing the previous state-of-the-art by a significant margin. These results demonstrate the effectiveness of combining advanced object detection, robust data augmentation, and masked autoencoder-based video modeling for reliable ASD vs. TD classification in naturalistic settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes