"Don't quote me on that": Finding Mixtures of Sources in News Articles
This work addresses the need for automated source analysis in journalism to support tasks like opinion mining, representing an incremental step towards computational journalism systems.
The paper tackled the problem of identifying and categorizing sources in news articles by constructing an ontological labeling system based on affiliation and role, and building a probabilistic model to infer these attributes and describe articles as mixtures of sources, achieving 80% accuracy in expert-evaluated trials.
Journalists publish statements provided by people, or \textit{sources} to contextualize current events, help voters make informed decisions, and hold powerful individuals accountable. In this work, we construct an ontological labeling system for sources based on each source's \textit{affiliation} and \textit{role}. We build a probabilistic model to infer these attributes for named sources and to describe news articles as mixtures of these sources. Our model outperforms existing mixture modeling and co-clustering approaches and correctly infers source-type in 80\% of expert-evaluated trials. Such work can facilitate research in downstream tasks like opinion and argumentation mining, representing a first step towards machine-in-the-loop \textit{computational journalism} systems.