CLOct 7, 2022

Towards Multi-Modal Sarcasm Detection via Hierarchical Congruity Modeling with Knowledge Enhancement

arXiv:2210.03501v224.8303 citationsh-index: 59Has Code

Originality Incremental advance

AI Analysis

This work addresses sarcasm detection for social media analysis, presenting an incremental improvement by integrating hierarchical modeling and knowledge enhancement.

The paper tackles the challenge of detecting sarcasm in multi-modal content by proposing a hierarchical framework that models both atomic-level and composition-level congruity between text and images, and incorporates external knowledge. The model demonstrates superiority on a public Twitter dataset.

Sarcasm is a linguistic phenomenon indicating a discrepancy between literal meanings and implied intentions. Due to its sophisticated nature, it is usually challenging to be detected from the text itself. As a result, multi-modal sarcasm detection has received more attention in both academia and industries. However, most existing techniques only modeled the atomic-level inconsistencies between the text input and its accompanying image, ignoring more complex compositions for both modalities. Moreover, they neglected the rich information contained in external knowledge, e.g., image captions. In this paper, we propose a novel hierarchical framework for sarcasm detection by exploring both the atomic-level congruity based on multi-head cross attention mechanism and the composition-level congruity based on graph neural networks, where a post with low congruity can be identified as sarcasm. In addition, we exploit the effect of various knowledge resources for sarcasm detection. Evaluation results on a public multi-modal sarcasm detection dataset based on Twitter demonstrate the superiority of our proposed model.

View on arXiv PDF Code

Similar