Hybrid Multimodal Fusion for Humor Detection
This work addresses humor detection in a specific domain (sports press conferences), but it is incremental as it builds on existing multimodal fusion techniques.
The paper tackled humor detection from audiovisual recordings of German football Bundesliga press conferences by proposing a hybrid multimodal fusion model, achieving an AUC of 0.8972 on the test set.
In this paper, we present our solution to the MuSe-Humor sub-challenge of the Multimodal Emotional Challenge (MuSe) 2022. The goal of the MuSe-Humor sub-challenge is to detect humor and calculate AUC from audiovisual recordings of German football Bundesliga press conferences. It is annotated for humor displayed by the coaches. For this sub-challenge, we first build a discriminant model using the transformer module and BiLSTM module, and then propose a hybrid fusion strategy to use the prediction results of each modality to improve the performance of the model. Our experiments demonstrate the effectiveness of our proposed model and hybrid fusion strategy on multimodal fusion, and the AUC of our proposed model on the test set is 0.8972.