CVJan 7

Combining facial videos and biosignals for stress estimation during driving

arXiv:2601.04376v1
Originality Incremental advance
AI Analysis

This work addresses stress recognition for drivers, offering a method that combines facial and physiological data, but it is incremental as it builds on existing techniques like EMOCA and Transformers.

The paper tackled stress estimation during distracted driving by analyzing 3D facial geometry and physiological signals, achieving an AUROC of 92% and accuracy of 86.7% with cross-modal attention fusion.

Reliable stress recognition from facial videos is challenging due to stress's subjective nature and voluntary facial control. While most methods rely on Facial Action Units, the role of disentangled 3D facial geometry remains underexplored. We address this by analyzing stress during distracted driving using EMOCA-derived 3D expression and pose coefficients. Paired hypothesis tests between baseline and stressor phases reveal that 41 of 56 coefficients show consistent, phase-specific stress responses comparable to physiological markers. Building on this, we propose a Transformer-based temporal modeling framework and assess unimodal, early-fusion, and cross-modal attention strategies. Cross-Modal Attention fusion of EMOCA and physiological signals achieves best performance (AUROC 92\%, Accuracy 86.7\%), with EMOCA-gaze fusion also competitive (AUROC 91.8\%). This highlights the effectiveness of temporal modeling and cross-modal attention for stress recognition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes