MM CL SIFeb 26, 2023

Understanding Social Media Cross-Modality Discourse in Linguistic Space

Chunpu Xu, Hanzhuo Tan, Jing Li, Piji Li

arXiv:2302.13311v139.8292 citationsh-index: 29Has Code

Originality Incremental advance

AI Analysis

This addresses the gap in analyzing multimedia discourse for social media users, though it is incremental as it builds on existing multimodal methods.

The paper tackles the problem of understanding how images and texts combine to form coherent meanings in social media, introducing the concept of cross-modality discourse and building a dataset of 16K annotated tweets. The results show that a multimedia encoder achieves state-of-the-art performance.

The multimedia communications with texts and images are popular on social media. However, limited studies concern how images are structured with texts to form coherent meanings in human cognition. To fill in the gap, we present a novel concept of cross-modality discourse, reflecting how human readers couple image and text understandings. Text descriptions are first derived from images (named as subtitles) in the multimedia contexts. Five labels -- entity-level insertion, projection and concretization and scene-level restatement and extension -- are further employed to shape the structure of subtitles and texts and present their joint meanings. As a pilot study, we also build the very first dataset containing 16K multimedia tweets with manually annotated discourse labels. The experimental results show that the multimedia encoder based on multi-head attention with captions is able to obtain the-state-of-the-art results.

View on arXiv PDF Code

Similar