Mark Mets

2papers

2 Papers

26.1SIMar 19
Crisis-induced differences in attention towards Ukraine in Twitter 2008-2023

Mark Mets, Peter Sheridan Dodds, Maximilian Schich

Aggression against Ukraine has drawn widespread international attention, particularly in the wake of the two Russian invasions into Ukrainian territory in 2014 and 2022. Although previous studies have examined social-media dynamics around these events, a comparative longitudinal data-driven view across languages is still missing. This article fills this gap by mapping added attention to "Ukraine" on Twitter in 28 languages from 2008 to 2023, using a deceptively simple DNA microarray-inspired cartography of log over-expression relative to each language's baseline frequency. This macro-scale visualization makes familiar events stand out while uncovering subtler patterns beyond the cognitive reach of any single-language audience. Most strikingly, two nearly non-overlapping language clusters emerge, one peaking around 2014 and the other around 2022 with distinct onset and decay profiles that mirror national readiness (or reluctance) to support Ukraine. By capturing attention at local, meso, and global scales, our approach offers a versatile tool for comparing relative bias across languages, user subgroups, platforms, or even historical print corpora. Ultimately, our cartographic approach reveals a troubling asymmetry: while publicly accessible data allows for an approximation of global attention patterns, the complete and unfiltered view remains largely hidden behind the closed, proprietary algorithms of major social media platforms, granting a far more comprehensive access to understanding global information flows.

CLMay 22, 2023
Automated stance detection in complex topics and small languages: the challenging case of immigration in polarizing news media

Mark Mets, Andres Karjus, Indrek Ibrus et al.

Automated stance detection and related machine learning methods can provide useful insights for media monitoring and academic research. Many of these approaches require annotated training datasets, which limits their applicability for languages where these may not be readily available. This paper explores the applicability of large language models for automated stance detection in a challenging scenario, involving a morphologically complex, lower-resource language, and a socio-culturally complex topic, immigration. If the approach works in this case, it can be expected to perform as well or better in less demanding scenarios. We annotate a large set of pro and anti-immigration examples, and compare the performance of multiple language models as supervised learners. We also probe the usability of ChatGPT as an instructable zero-shot classifier for the same task. Supervised achieves acceptable performance, and ChatGPT yields similar accuracy. This is promising as a potentially simpler and cheaper alternative for text classification tasks, including in lower-resource languages. We further use the best-performing model to investigate diachronic trends over seven years in two corpora of Estonian mainstream and right-wing populist news sources, demonstrating the applicability of the approach for news analytics and media monitoring settings, and discuss correspondences between stance changes and real-world events.