43.2CYMar 24
Evidence of political bias in search engines and language models before major electionsÍris Damião, Paulo Almeida, João Franco et al.
Search engines (SEs) and large language models (LLMs) are central to political information access, yet their algorithmic decisions and potential underlying biases remain underexplored. We developed a standardized, privacy-preserving, bot-and-proxy methodology to audit four SEs and two LLMs before the 2024 European Parliament and US presidential elections. We collected answers to approximately 4,360 queries related to elections in five EU countries and 15 US counties, identified political entities and topics in those answers, and mapped them to ideological positions (EU) or issue associations (US). In Europe, SE results disproportionately mentioned far-right entities beyond levels expected from polls, past elections, or media salience. In the US, Google strongly favored topics more important to Republican voters, while other search engines favored issues more relevant to Democrats. LLMs responses were more balanced, although there is evidence of overrepresentation of far-right (and Green) entities. These results show evidence of bias and open important discussions on how even small skews in widely used platforms may influence democratic processes, calling for systematic audits of their outputs.
CLApr 26, 2020
PTPARL-D: Annotated Corpus of 44 years of Portuguese Parliament debatesPaulo Almeida, Manuel Marques-Pita, Joana Gonçalves-Sá
In a representative democracy, some decide in the name of the rest, and these elected officials are commonly gathered in public assemblies, such as parliaments, where they discuss policies, legislate, and vote on fundamental initiatives. A core aspect of such democratic processes are the plenary debates, where important public discussions take place. Many parliaments around the world are increasingly keeping the transcripts of such debates, and other parliamentary data, in digital formats accessible to the public, increasing transparency and accountability. Furthermore, some parliaments are bringing old paper transcripts to semi-structured digital formats. However, these records are often only provided as raw text or even as images, with little to no annotation, and inconsistent formats, making them difficult to analyze and study, reducing both transparency and public reach. Here, we present PTPARL-D, an annotated corpus of debates in the Portuguese Parliament, from 1976 to 2019, covering the entire period of Portuguese democracy.