Optimizing text representations to capture (dis)similarity between political parties
This addresses a bottleneck in computational social science for researchers analyzing political texts, but it is incremental as it builds on existing methods.
The study tackled the problem of optimizing text representations for modeling political party similarities, finding that document structure-based heuristics with normalization can reliably predict party similarity without manual annotation.
Even though fine-tuned neural language models have been pivotal in enabling "deep" automatic text analysis, optimizing text representations for specific applications remains a crucial bottleneck. In this study, we look at this problem in the context of a task from computational social science, namely modeling pairwise similarities between political parties. Our research question is what level of structural information is necessary to create robust text representation, contrasting a strongly informed approach (which uses both claim span and claim category annotations) with approaches that forgo one or both types of annotation with document structure-based heuristics. Evaluating our models on the manifestos of German parties for the 2021 federal election. We find that heuristics that maximize within-party over between-party similarity along with a normalization step lead to reliable party similarity prediction, without the need for manual annotation.