CLNov 15, 2023

Social Meme-ing: Measuring Linguistic Variation in Memes

Naitian Zhou, David Jurgens, David Bamman

Berkeley

arXiv:2311.09130v110.332 citationsh-index: 42Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of understanding multimodal language variation for researchers in NLP and sociolinguistics, but it is incremental as it extends existing text-based methods to memes.

The authors tackled the problem of measuring sociolinguistic variation in memes by developing a computational pipeline to cluster 3.8M meme images from Reddit into templates and semantic variables, discovering that meme usage varies meaningfully between subreddits and aligns with patterns found in written language.

Much work in the space of NLP has used computational methods to explore sociolinguistic variation in text. In this paper, we argue that memes, as multimodal forms of language comprised of visual templates and text, also exhibit meaningful social variation. We construct a computational pipeline to cluster individual instances of memes into templates and semantic variables, taking advantage of their multimodal structure in doing so. We apply this method to a large collection of meme images from Reddit and make available the resulting \textsc{SemanticMemes} dataset of 3.8M images clustered by their semantic function. We use these clusters to analyze linguistic variation in memes, discovering not only that socially meaningful variation in meme usage exists between subreddits, but that patterns of meme innovation and acculturation within these communities align with previous findings on written language.

View on arXiv PDF Code

Similar