CVJan 7

Combining facial videos and biosignals for stress estimation during driving

Paraskevi Valergaki, Vassilis C. Nicodemou, Iason Oikonomidis, Antonis Argyros, Anastasios Roussos

arXiv:2601.04376v11.5

Originality Incremental advance

AI Analysis

This work addresses stress recognition for drivers, offering a method that combines facial and physiological data, but it is incremental as it builds on existing techniques like EMOCA and Transformers.

The paper tackled stress estimation during distracted driving by analyzing 3D facial geometry and physiological signals, achieving an AUROC of 92% and accuracy of 86.7% with cross-modal attention fusion.

Reliable stress recognition from facial videos is challenging due to stress's subjective nature and voluntary facial control. While most methods rely on Facial Action Units, the role of disentangled 3D facial geometry remains underexplored. We address this by analyzing stress during distracted driving using EMOCA-derived 3D expression and pose coefficients. Paired hypothesis tests between baseline and stressor phases reveal that 41 of 56 coefficients show consistent, phase-specific stress responses comparable to physiological markers. Building on this, we propose a Transformer-based temporal modeling framework and assess unimodal, early-fusion, and cross-modal attention strategies. Cross-Modal Attention fusion of EMOCA and physiological signals achieves best performance (AUROC 92\%, Accuracy 86.7\%), with EMOCA-gaze fusion also competitive (AUROC 91.8\%). This highlights the effectiveness of temporal modeling and cross-modal attention for stress recognition.

View on arXiv PDF

Similar