CVMar 24, 2022

Multi-modal Multi-label Facial Action Unit Detection with Transformer

arXiv:2203.13301v220 citationsh-index: 7
AI Analysis

This work addresses facial expression analysis for affective computing applications, but it is incremental as it builds on existing transformer and multi-modal approaches.

The paper tackled facial action unit detection in video by proposing a transformer-based model that integrates audio and visual features and learns correlations between action units, achieving better performance than the baseline on the validation dataset.

Facial Action Coding System is an important approach of facial expression analysis.This paper describes our submission to the third Affective Behavior Analysis (ABAW) 2022 competition. We proposed a transfomer based model to detect facial action unit (FAU) in video. To be specific, we firstly trained a multi-modal model to extract both audio and visual feature. After that, we proposed a action units correlation module to learn relationships between each action unit labels and refine action unit detection result. Experimental results on validation dataset shows that our method achieves better performance than baseline model, which verifies that the effectiveness of proposed network.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes