CLAIMay 3, 2022

BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions

arXiv:2205.01506v1585 citationsh-index: 20
Originality Synthesis-oriented
AI Analysis

This provides a resource for studying political discourse and code-switching in Basque and Spanish, but it is incremental as it focuses on a specific domain without new methods.

The authors compiled BasqueParl, a bilingual corpus of Basque parliamentary transcripts with heavy Basque-Spanish code-switching, enriched with metadata and processed for named entities and lemmas, and used it to analyze language use across time, parties, and gender.

Parliamentary transcripts provide a valuable resource to understand the reality and know about the most important facts that occur over time in our societies. Furthermore, the political debates captured in these transcripts facilitate research on political discourse from a computational social science perspective. In this paper we release the first version of a newly compiled corpus from Basque parliamentary transcripts. The corpus is characterized by heavy Basque-Spanish code-switching, and represents an interesting resource to study political discourse in contrasting languages such as Basque and Spanish. We enrich the corpus with metadata related to relevant attributes of the speakers and speeches (language, gender, party...) and process the text to obtain named entities and lemmas. The obtained metadata is then used to perform a detailed corpus analysis which provides interesting insights about the language use of the Basque political representatives across time, parties and gender.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes