EventFormer: AU Event Transformer for Facial Action Unit Event Detection
This work addresses the need for high-level emotion analysis in real-world applications by moving beyond frame-level AU detection to event detection, though it is incremental as it builds on existing AU analysis methods.
The authors tackled the problem of detecting facial action unit (AU) events from video sequences, proposing EventFormer as the first method to directly detect AU events by framing it as a multiple class-specific sets prediction problem, achieving superior performance on the BP4D benchmark dataset.
Facial action units (AUs) play an indispensable role in human emotion analysis. We observe that although AU-based high-level emotion analysis is urgently needed by real-world applications, frame-level AU results provided by previous works cannot be directly used for such analysis. Moreover, as AUs are dynamic processes, the utilization of global temporal information is important but has been gravely ignored in the literature. To this end, we propose EventFormer for AU event detection, which is the first work directly detecting AU events from a video sequence by viewing AU event detection as a multiple class-specific sets prediction problem. Extensive experiments conducted on a commonly used AU benchmark dataset, BP4D, show the superiority of EventFormer under suitable metrics.