Multi-modal Expression Recognition with Ensemble Method
This is an incremental improvement for affective computing in real-world scenarios.
The paper tackled expression recognition by combining multimodal features and ensemble strategies, achieving an average F1 Score of 0.45774 on a validation set.
This paper presents our submission to the Expression Classification Challenge of the fifth Affective Behavior Analysis in-the-wild (ABAW) Competition. In our method, multimodal feature combinations extracted by several different pre-trained models are applied to capture more effective emotional information. For these combinations of visual and audio modal features, we utilize two temporal encoders to explore the temporal contextual information in the data. In addition, we employ several ensemble strategies for different experimental settings to obtain the most accurate expression recognition results. Our system achieves the average F1 Score of 0.45774 on the validation set.