Unsupervised Bias Detection in College Student Newspapers
This provides a tool for researchers or institutions to analyze bias in student newspapers with less reliance on labeled data, though it is incremental as it builds on existing sentiment and scraping methods.
The paper tackled the problem of detecting bias in college student newspapers by developing an unsupervised pipeline that scrapes complex archive sites to create a dataset of 23,154 entries from 14 papers, and uses large language model sentiment analysis to calculate bias with minimal human input.
This paper presents a pipeline with minimal human influence for scraping and detecting bias on college newspaper archives. This paper introduces a framework for scraping complex archive sites that automated tools fail to grab data from, and subsequently generates a dataset of 14 student papers with 23,154 entries. This data can also then be queried by keyword to calculate bias by comparing the sentiment of a large language model summary to the original article. The advantages of this approach are that it is less comparative than reconstruction bias and requires less labelled data than generating keyword sentiment. Results are calculated on politically charged words as well as control words to show how conclusions can be drawn. The complete method facilitates the extraction of nuanced insights with minimal assumptions and categorizations, paving the way for a more objective understanding of bias within student newspaper sources.