Boosting Micro-Expression Analysis via Prior-Guided Video-Level Regression
This work addresses the challenge of capturing complex temporal dynamics in micro-expression analysis for applications like emotion detection, though it is incremental as it builds on existing video-level regression frameworks.
The paper tackles the problem of analyzing micro-expressions by proposing a prior-guided video-level regression method that improves spotting of onset, apex, and offset phases, achieving state-of-the-art results with an STRS of 0.0562 on CAS(ME)^3 and 0.2000 on SAMMLV datasets.
Micro-expressions (MEs) are involuntary, low-intensity, and short-duration facial expressions that often reveal an individual's genuine thoughts and emotions. Most existing ME analysis methods rely on window-level classification with fixed window sizes and hard decisions, which limits their ability to capture the complex temporal dynamics of MEs. Although recent approaches have adopted video-level regression frameworks to address some of these challenges, interval decoding still depends on manually predefined, window-based methods, leaving the issue only partially mitigated. In this paper, we propose a prior-guided video-level regression method for ME analysis. We introduce a scalable interval selection strategy that comprehensively considers the temporal evolution, duration, and class distribution characteristics of MEs, enabling precise spotting of the onset, apex, and offset phases. In addition, we introduce a synergistic optimization framework, in which the spotting and recognition tasks share parameters except for the classification heads. This fully exploits complementary information, makes more efficient use of limited data, and enhances the model's capability. Extensive experiments on multiple benchmark datasets demonstrate the state-of-the-art performance of our method, with an STRS of 0.0562 on CAS(ME)$^3$ and 0.2000 on SAMMLV. The code is available at https://github.com/zizheng-guo/BoostingVRME.