CV AINov 30, 2023

LLVMs4Protest: Harnessing the Power of Large Language and Vision Models for Deciphering Protests in the News

arXiv:2311.18241v1h-index: 2Has Code

Originality Synthesis-oriented

AI Analysis

This provides tools for social movement scholars to analyze protests in multi-modal news data, but it is incremental as it applies existing methods to new datasets.

The researchers fine-tuned Longformer and Swin-Transformer V2 models on text and image datasets to identify protests in news articles, making these models publicly available for social movement scholars.

Large language and vision models have transformed how social movements scholars identify protest and extract key protest attributes from multi-modal data such as texts, images, and videos. This article documents how we fine-tuned two large pretrained transformer models, including longformer and swin-transformer v2, to infer potential protests in news articles using textual and imagery data. First, the longformer model was fine-tuned using the Dynamic of Collective Action (DoCA) Corpus. We matched the New York Times articles with the DoCA database to obtain a training dataset for downstream tasks. Second, the swin-transformer v2 models was trained on UCLA-protest imagery data. UCLA-protest project contains labeled imagery data with information such as protest, violence, and sign. Both fine-tuned models will be available via \url{https://github.com/Joshzyj/llvms4protest}. We release this short technical report for social movement scholars who are interested in using LLVMs to infer protests in textual and imagery data.

View on arXiv PDF Code

Similar