CLJul 1, 2022
Panning for gold: Lessons learned from the platform-agnostic automated detection of political content in textual dataMykola Makhortykh, Ernesto de León, Aleksandra Urman et al.
The growing availability of data about online information behaviour enables new possibilities for political communication research. However, the volume and variety of these data makes them difficult to analyse and prompts the need for developing automated content approaches relying on a broad range of natural language processing techniques (e.g. machine learning- or neural network-based ones). In this paper, we discuss how these techniques can be used to detect political content across different platforms. Using three validation datasets, which include a variety of political and non-political textual documents from online platforms, we systematically compare the performance of three groups of detection techniques relying on dictionaries, supervised machine learning, or neural networks. We also examine the impact of different modes of data preprocessing (e.g. stemming and stopword removal) on the low-cost implementations of these techniques using a large set (n = 66) of detection models. Our results show the limited impact of preprocessing on model performance, with the best results for less noisy data being achieved by neural network- and machine-learning-based models, in contrast to the more robust performance of dictionary-based models on noisy data.
45.4CYApr 15
High-Risk Memories? Comparative audit of the representation of Second World War atrocities in Ukraine by generative AI applicationsMykola Makhortykh, Victoria Vziatysheva, Maryna Sydorova
The rise of generative artificial intelligence (genAI) models poses new possibilities and risks for how the past is remembered by accelerating content production and altering the process of information discovery. The most critical risk is historical misrepresentation, which ranges from the distortion of facts and inaccurate depiction of specific groups to more subtle forms, such as the selective moralization of history. The dangers of misrepresentation of the past are particularly pronounced for high-risk memories, such as memories of past atrocities, which have a strong emotional load and are often instrumentalised by political actors. To understand how substantive this risk is, we empirically investigate how genAI applications deal with high-risk memories of the Second World War atrocities in Ukraine. This case is crucial due to the scope of the atrocities and the intense, often instrumentalised, contestation surrounding their memory. We audit the performance of three common genAI applications for different types of misrepresentation, including hallucinations and inconsistent moralization, and discuss the implications for future memory practices.
CLAug 30, 2024
Finding frames with BERT: A transformer-based approach to generic news frame detectionVihang Jumle, Mykola Makhortykh, Maryna Sydorova et al.
Framing is among the most extensively used concepts in the field of communication science. The availability of digital data offers new possibilities for studying how specific aspects of social reality are made more salient in online communication but also raises challenges related to the scaling of framing analysis and its adoption to new research areas (e.g. studying the impact of artificial intelligence-powered systems on representation of societally relevant issues). To address these challenges, we introduce a transformer-based approach for generic news frame detection in Anglophone online content. While doing so, we discuss the composition of the training and test datasets, the model architecture, and the validation of the approach and reflect on the possibilities and limitations of the automated detection of generic news frames.