PTPARL-D: Annotated Corpus of 44 years of Portuguese Parliament debates
This work provides a resource for researchers and the public to study Portuguese democratic processes, though it is incremental as it focuses on data annotation rather than novel methods.
The authors tackled the problem of analyzing Portuguese Parliament debates by creating PTPARL-D, an annotated corpus covering 44 years (1976-2019) of plenary debates, which addresses the lack of structured and annotated data in digital formats.
In a representative democracy, some decide in the name of the rest, and these elected officials are commonly gathered in public assemblies, such as parliaments, where they discuss policies, legislate, and vote on fundamental initiatives. A core aspect of such democratic processes are the plenary debates, where important public discussions take place. Many parliaments around the world are increasingly keeping the transcripts of such debates, and other parliamentary data, in digital formats accessible to the public, increasing transparency and accountability. Furthermore, some parliaments are bringing old paper transcripts to semi-structured digital formats. However, these records are often only provided as raw text or even as images, with little to no annotation, and inconsistent formats, making them difficult to analyze and study, reducing both transparency and public reach. Here, we present PTPARL-D, an annotated corpus of debates in the Portuguese Parliament, from 1976 to 2019, covering the entire period of Portuguese democracy.