Multi-label Transformer for Action Unit Detection
This work addresses AU detection for affective computing, but it appears incremental as it applies an existing transformer method to a new dataset.
The paper tackled the problem of Action Unit (AU) detection by applying a multi-label detection transformer with multi-head attention to learn relevant facial parts for each AU, using the ABAW3 challenge dataset of 2M frames.
Action Unit (AU) Detection is the branch of affective computing that aims at recognizing unitary facial muscular movements. It is key to unlock unbiased computational face representations and has therefore aroused great interest in the past few years. One of the main obstacles toward building efficient deep learning based AU detection system is the lack of wide facial image databases annotated by AU experts. In that extent the ABAW challenge paves the way toward better AU detection as it involves a 2M frames AU annotated dataset. In this paper, we present our submission to the ABAW3 challenge. In a nutshell, we applied a multi-label detection transformer that leverage multi-head attention to learn which part of the face image is the most relevant to predict each AU.