AIMay 23, 2025
MemeReaCon: Probing Contextual Meme Understanding in Large Vision-Language ModelsZhengyi Zhao, Shubo Zhang, Yuxi Zhang et al.
Memes have emerged as a popular form of multimodal online communication, where their interpretation heavily depends on the specific context in which they appear. Current approaches predominantly focus on isolated meme analysis, either for harmful content detection or standalone interpretation, overlooking a fundamental challenge: the same meme can express different intents depending on its conversational context. This oversight creates an evaluation gap: although humans intuitively recognize how context shapes meme interpretation, Large Vision Language Models (LVLMs) can hardly understand context-dependent meme intent. To address this critical limitation, we introduce MemeReaCon, a novel benchmark specifically designed to evaluate how LVLMs understand memes in their original context. We collected memes from five different Reddit communities, keeping each meme's image, the post text, and user comments together. We carefully labeled how the text and meme work together, what the poster intended, how the meme is structured, and how the community responded. Our tests with leading LVLMs show a clear weakness: models either fail to interpret critical information in the contexts, or overly focus on visual details while overlooking communicative purpose. MemeReaCon thus serves both as a diagnostic tool exposing current limitations and as a challenging benchmark to drive development toward more sophisticated LVLMs of the context-aware understanding.
SPOct 13, 2021
Positional-Spectral-Temporal Attention in 3D Convolutional Neural Networks for EEG Emotion RecognitionJiyao Liu, Yanxi Zhao, Hao Wu et al.
Recognizing the feelings of human beings plays a critical role in our daily communication. Neuroscience has demonstrated that different emotion states present different degrees of activation in different brain regions, EEG frequency bands and temporal stamps. In this paper, we propose a novel structure to explore the informative EEG features for emotion recognition. The proposed module, denoted by PST-Attention, consists of Positional, Spectral and Temporal Attention modules to explore more discriminative EEG features. Specifically, the Positional Attention module is to capture the activate regions stimulated by different emotions in the spatial dimension. The Spectral and Temporal Attention modules assign the weights of different frequency bands and temporal slices respectively. Our method is adaptive as well as efficient which can be fit into 3D Convolutional Neural Networks (3D-CNN) as a plug-in module. We conduct experiments on two real-world datasets. 3D-CNN combined with our module achieves promising results and demonstrate that the PST-Attention is able to capture stable patterns for emotion recognition from EEG.
ROOct 13, 2021
Spatial-temporal Transformers for EEG Emotion RecognitionJiyao Liu, Hao Wu, Li Zhang et al.
Electroencephalography (EEG) is a popular and effective tool for emotion recognition. However, the propagation mechanisms of EEG in the human brain and its intrinsic correlation with emotions are still obscure to researchers. This work proposes four variant transformer frameworks~(spatial attention, temporal attention, sequential spatial-temporal attention and simultaneous spatial-temporal attention) for EEG emotion recognition to explore the relationship between emotion and spatial-temporal EEG features. Specifically, spatial attention and temporal attention are to learn the topological structure information and time-varying EEG characteristics for emotion recognition respectively. Sequential spatial-temporal attention does the spatial attention within a one-second segment and temporal attention within one sample sequentially to explore the influence degree of emotional stimulation on EEG signals of diverse EEG electrodes in the same temporal segment. The simultaneous spatial-temporal attention, whose spatial and temporal attention are performed simultaneously, is used to model the relationship between different spatial features in different time segments. The experimental results demonstrate that simultaneous spatial-temporal attention leads to the best emotion recognition accuracy among the design choices, indicating modeling the correlation of spatial and temporal features of EEG signals is significant to emotion recognition.