CLAIOct 23, 2022

A Greek Parliament Proceedings Dataset for Computational Linguistics and Political Analysis

arXiv:2210.12883v15 citationsh-index: 17
Originality Synthesis-oriented
AI Analysis

This dataset addresses a gap for researchers in computational linguistics and political analysis focusing on Greek, though it is incremental as it applies existing methods to new data.

The authors tackled the scarcity of large, diachronic political discourse datasets for resource-lean languages by introducing a curated dataset of Greek Parliament Proceedings from 1989 to 2020, consisting of over 1 million speeches with metadata extracted from 5,355 files, and demonstrated its application in studying word usage changes and semantic shifts.

Large, diachronic datasets of political discourse are hard to come across, especially for resource-lean languages such as Greek. In this paper, we introduce a curated dataset of the Greek Parliament Proceedings that extends chronologically from 1989 up to 2020. It consists of more than 1 million speeches with extensive metadata, extracted from 5,355 parliamentary record files. We explain how it was constructed and the challenges that we had to overcome. The dataset can be used for both computational linguistics and political analysis-ideally, combining the two. We present such an application, showing (i) how the dataset can be used to study the change of word usage through time, (ii) between significant historical events and political parties, (iii) by evaluating and employing algorithms for detecting semantic shifts.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes