Detection of Intoxicated Individuals from Facial Video Sequences via a Recurrent Fusion Model
This addresses public safety concerns by enabling non-invasive alcohol intoxication detection, but it is incremental as it builds on existing video analysis methods.
The study tackled the problem of detecting alcohol intoxication from facial video sequences by introducing a recurrent fusion model that integrates facial landmark analysis with spatiotemporal features, achieving 95.82% accuracy, 0.977 precision, and 0.97 recall.
Alcohol consumption is a significant public health concern and a major cause of accidents and fatalities worldwide. This study introduces a novel video-based facial sequence analysis approach dedicated to the detection of alcohol intoxication. The method integrates facial landmark analysis via a Graph Attention Network (GAT) with spatiotemporal visual features extracted using a 3D ResNet. These features are dynamically fused with adaptive prioritization to enhance classification performance. Additionally, we introduce a curated dataset comprising 3,542 video segments derived from 202 individuals to support training and evaluation. Our model is compared against two baselines: a custom 3D-CNN and a VGGFace+LSTM architecture. Experimental results show that our approach achieves 95.82% accuracy, 0.977 precision, and 0.97 recall, outperforming prior methods. The findings demonstrate the model's potential for practical deployment in public safety systems for non-invasive, reliable alcohol intoxication detection.