CLAug 26, 2025
Affective Polarization across European ParliamentsBojan Evkoski, Igor Mozetič, Nikola Ljubešić et al.
Affective polarization, characterized by increased negativity and hostility towards opposing groups, has become a prominent feature of political discourse worldwide. Our study examines the presence of this type of polarization in a selection of European parliaments in a fully automated manner. Utilizing a comprehensive corpus of parliamentary speeches from the parliaments of six European countries, we employ natural language processing techniques to estimate parliamentarian sentiment. By comparing the levels of negativity conveyed in references to individuals from opposing groups versus one's own, we discover patterns of affectively polarized interactions. The findings demonstrate the existence of consistent affective polarization across all six European parliaments. Although activity correlates with negativity, there is no observed difference in affective polarization between less active and more active members of parliament. Finally, we show that reciprocity is a contributing mechanism in affective polarization between parliamentarians across all six parliaments.
SIMay 31, 2021
Retweet communities reveal the main sources of hate speechBojan Evkoski, Andraz Pelicon, Igor Mozetic et al.
We address a challenging problem of identifying main sources of hate speech on Twitter. On one hand, we carefully annotate a large set of tweets for hate speech, and deploy advanced deep learning to produce high quality hate speech classification models. On the other hand, we create retweet networks, detect communities and monitor their evolution through time. This combined approach is applied to three years of Slovenian Twitter data. We report a number of interesting results. Hate speech is dominated by offensive tweets, related to political and ideological issues. The share of unacceptable tweets is moderately increasing with time, from the initial 20% to 30% by the end of 2020. Unacceptable tweets are retweeted significantly more often than acceptable tweets. About 60% of unacceptable tweets are produced by a single right-wing community of only moderate size. Institutional Twitter accounts and media accounts post significantly less unacceptable tweets than individual accounts. In fact, the main sources of unacceptable tweets are anonymous accounts, and accounts that were suspended or closed during the years 2018-2020.
SIMay 28, 2021
Online Hate: Behavioural Dynamics and Relationship with MisinformationMatteo Cinelli, Andraž Pelicon, Igor Mozetič et al.
Online debates are often characterised by extreme polarisation and heated discussions among users. The presence of hate speech online is becoming increasingly problematic, making necessary the development of appropriate countermeasures. In this work, we perform hate speech detection on a corpus of more than one million comments on YouTube videos through a machine learning model fine-tuned on a large set of hand-annotated data. Our analysis shows that there is no evidence of the presence of "serial haters", intended as active users posting exclusively hateful comments. Moreover, coherently with the echo chamber hypothesis, we find that users skewed towards one of the two categories of video channels (questionable, reliable) are more prone to use inappropriate, violent, or hateful language within their opponents community. Interestingly, users loyal to reliable sources use on average a more toxic language than their counterpart. Finally, we find that the overall toxicity of the discussion increases with its length, measured both in terms of number of comments and time. Our results show that, coherently with Godwin's law, online debates tend to degenerate towards increasingly toxic exchanges of views.
CLDec 2, 2019
Leveraging Contextual Embeddings for Detecting Diachronic Semantic ShiftMatej Martinc, Petra Kralj Novak, Senja Pollak
We propose a new method that leverages contextual embeddings for the task of diachronic semantic shift detection by generating time specific word representations from BERT embeddings. The results of our experiments in the domain specific LiverpoolFC corpus suggest that the proposed method has performance comparable to the current state-of-the-art without requiring any time consuming domain adaptation on large corpora. The results on the newly created Brexit news corpus suggest that the method can be successfully used for the detection of a short-term yearly semantic shift. And lastly, the model also shows promising results in a multilingual settings, where the task was to detect differences and similarities between diachronic semantic shifts in different languages.
SIApr 6, 2018
Forex trading and Twitter: Spam, bots, and reputation manipulationIgor Mozetič, Peter Gabrovšek, Petra Kralj Novak
Currency trading (Forex) is the largest world market in terms of volume. We analyze trading and tweeting about the EUR-USD currency pair over a period of three years. First, a large number of tweets were manually labeled, and a Twitter stance classification model is constructed. The model then classifies all the tweets by the trading stance signal: buy, hold, or sell (EUR vs. USD). The Twitter stance is compared to the actual currency rates by applying the event study methodology, well-known in financial economics. It turns out that there are large differences in Twitter stance distribution and potential trading returns between the four groups of Twitter users: trading robots, spammers, trading companies, and individual traders. Additionally, we observe attempts of reputation manipulation by post festum removal of tweets with poor predictions, and deleting/reposting of identical tweets to increase the visibility without tainting one's Twitter timeline.
CLSep 25, 2015
Sentiment of EmojisPetra Kralj Novak, Jasmina Smailović, Borut Sluban et al.
There is a new generation of emoticons, called emojis, that is increasingly being used in mobile communications and social media. In the past two years, over ten billion emojis were used on Twitter. Emojis are Unicode graphic symbols, used as a shorthand to express concepts and ideas. In contrast to the small number of well-known emoticons that carry clear emotional contents, there are hundreds of emojis. But what are their emotional contents? We provide the first emoji sentiment lexicon, called the Emoji Sentiment Ranking, and draw a sentiment map of the 751 most frequently used emojis. The sentiment of the emojis is computed from the sentiment of the tweets in which they occur. We engaged 83 human annotators to label over 1.6 million tweets in 13 European languages by the sentiment polarity (negative, neutral, or positive). About 4% of the annotated tweets contain emojis. The sentiment analysis of the emojis allows us to draw several interesting conclusions. It turns out that most of the emojis are positive, especially the most popular ones. The sentiment distribution of the tweets with and without emojis is significantly different. The inter-annotator agreement on the tweets with emojis is higher. Emojis tend to occur at the end of the tweets, and their sentiment polarity increases with the distance. We observe no significant differences in the emoji rankings between the 13 languages and the Emoji Sentiment Ranking. Consequently, we propose our Emoji Sentiment Ranking as a European language-independent resource for automated sentiment analysis. Finally, the paper provides a formalization of sentiment and a novel visualization in the form of a sentiment bar.
IRJul 31, 2015
Analysis of Financial News with NewsStreamPetra Kralj Novak, Miha Grcar, Borut Sluban et al.
Unstructured data, such as news and blogs, can provide valuable insights into the financial world. We present the NewsStream portal, an intuitive and easy-to-use tool for news analytics, which supports interactive querying and visualizations of the documents at different levels of detail. It relies on a scalable architecture for real-time processing of a continuous stream of textual data, which incorporates data acquisition, cleaning, natural-language preprocessing and semantic annotation components. It has been running for over two years and collected over 18 million news articles and blog posts. The NewsStream portal can be used to answer the questions when, how often, in what context, and with what sentiment was a financial entity or term mentioned in a continuous stream of news and blogs, and therefore providing a complement to news aggregators. We illustrate some features of our system in three use cases: relations between the rating agencies and the PIIGS countries, reflection of financial news on credit default swap (CDS) prices, the emergence of the Bitcoin digital currency, and visualizing how the world is connected through news.