CVJul 9, 2021

A Multi-modal and Multi-task Learning Method for Action Unit and Expression Recognition

arXiv:2107.04187v244 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of analyzing human affect for human-computer interaction systems, but it is incremental as it builds on existing benchmarks and methods.

The paper tackles action unit and expression recognition in in-the-wild settings by proposing a multi-modal and multi-task learning method using visual and audio information, achieving an AU score of 0.712 and an expression score of 0.477 on a validation set.

Analyzing human affect is vital for human-computer interaction systems. Most methods are developed in restricted scenarios which are not practical for in-the-wild settings. The Affective Behavior Analysis in-the-wild (ABAW) 2021 Contest provides a benchmark for this in-the-wild problem. In this paper, we introduce a multi-modal and multi-task learning method by using both visual and audio information. We use both AU and expression annotations to train the model and apply a sequence model to further extract associations between video frames. We achieve an AU score of 0.712 and an expression score of 0.477 on the validation set. These results demonstrate the effectiveness of our approach in improving model performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes