CLJun 16, 2025

Characterizing Linguistic Shifts in Croatian News via Diachronic Word Embeddings

arXiv:2506.13569v11 citationsh-index: 6Has CodeProceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025)
Originality Synthesis-oriented
AI Analysis

This work addresses how linguistic shifts reflect cultural changes for researchers in computational linguistics and Croatian media analysis, but it is incremental as it applies existing methods to a new dataset.

The study tackled the problem of measuring semantic change over time by analyzing a corpus of 9.5 million Croatian news articles from 25 years using diachronic word embeddings, finding that embeddings captured shifts in major topics like COVID-19 and EU accession and showed increased post-2020 positivity in sentiment analysis.

Measuring how semantics of words change over time improves our understanding of how cultures and perspectives change. Diachronic word embeddings help us quantify this shift, although previous studies leveraged substantial temporally annotated corpora. In this work, we use a corpus of 9.5 million Croatian news articles spanning the past 25 years and quantify semantic change using skip-gram word embeddings trained on five-year periods. Our analysis finds that word embeddings capture linguistic shifts of terms pertaining to major topics in this timespan (COVID-19, Croatia joining the European Union, technological advancements). We also find evidence that embeddings from post-2020 encode increased positivity in sentiment analysis tasks, contrasting studies reporting a decline in mental health over the same period.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes