Semantic Scaling: Bayesian Ideal Point Estimates with Large Language Models
This provides a novel tool for political scientists and researchers to measure ideology in populations that are difficult to survey, such as the public or Congress, with greater accuracy and flexibility than previous methods.
The paper tackles the problem of estimating ideological positions from text by introducing Semantic Scaling, a method that uses large language models to classify document stances and applies item response theory, resulting in improved performance over existing text-based methods and flexibility in defining ideological dimensions.
This paper introduces "Semantic Scaling," a novel method for ideal point estimation from text. I leverage large language models to classify documents based on their expressed stances and extract survey-like data. I then use item response theory to scale subjects from these data. Semantic Scaling significantly improves on existing text-based scaling methods, and allows researchers to explicitly define the ideological dimensions they measure. This represents the first scaling approach that allows such flexibility outside of survey instruments and opens new avenues of inquiry for populations difficult to survey. Additionally, it works with documents of varying length, and produces valid estimates of both mass and elite ideology. I demonstrate that the method can differentiate between policy preferences and in-group/out-group affect. Among the public, Semantic Scaling out-preforms Tweetscores according to human judgement; in Congress, it recaptures the first dimension DW-NOMINATE while allowing for greater flexibility in resolving construct validity challenges.