Robert Chew

CL
h-index16
4papers
150citations
Novelty46%
AI Score43

4 Papers

CLJun 23, 2023
LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding

Robert Chew, John Bollenbacher, Michael Wenger et al.

Deductive coding is a widely used qualitative research method for determining the prevalence of themes across documents. While useful, deductive coding is often burdensome and time consuming since it requires researchers to read, interpret, and reliably categorize a large body of unstructured text documents. Large language models (LLMs), like ChatGPT, are a class of quickly evolving AI tools that can perform a range of natural language processing and reasoning tasks. In this study, we explore the use of LLMs to reduce the time it takes for deductive coding while retaining the flexibility of a traditional content analysis. We outline the proposed approach, called LLM-assisted content analysis (LACA), along with an in-depth case study using GPT-3.5 for LACA on a publicly available deductive coding data set. Additionally, we conduct an empirical benchmark using LACA on 4 publicly available data sets to assess the broader question of how well GPT-3.5 performs across a range of deductive coding tasks. Overall, we find that GPT-3.5 can often perform deductive coding at levels of agreement comparable to human coders. Additionally, we demonstrate that LACA can help refine prompts for deductive coding, identify codes for which an LLM is randomly guessing, and help assess when to use LLMs vs. human coders for deductive coding. We conclude with several implications for future practice of deductive coding and related research methods.

MEApr 8
From Ground Truth to Measurement: A Statistical Framework for Human Labeling

Robert Chew, Stephanie Eckman, Christoph Kern et al.

Supervised machine learning assumes that labeled data provide accurate measurements of the concepts models are meant to learn. Yet in practice, human labeling introduces systematic variation arising from ambiguous items, divergent interpretations, and simple mistakes. Machine learning research commonly treats all disagreement as noise, which obscures these distinctions and limits our understanding of what models actually learn. This paper reframes annotation as a measurement process and introduces a statistical framework for decomposing labeling outcomes into interpretable sources of variation: instance difficulty, annotator bias, situational noise, and relational alignment. The framework extends classical measurement-error models to accommodate both shared and individualized notions of truth, reflecting traditional and human label variation interpretations of error, and provides a diagnostic for assessing which regime better characterizes a given task. Applying the proposed model to a multi-annotator natural language inference dataset, we find empirical evidence for all four theorized components and demonstrate the effectiveness of our approach. We conclude with implications for data-centric machine learning and outline how this approach can guide the development of a more systematic science of labeling.

LGDec 5, 2025
Developing synthetic microdata through machine learning for firm-level business surveys

Jorge Cisneros, Timothy Wojan, Matthew Williams et al.

Public-use microdata samples (PUMS) from the United States (US) Census Bureau on individuals have been available for decades. However, large increases in computing power and the greater availability of Big Data have dramatically increased the probability of re-identifying anonymized data, potentially violating the pledge of confidentiality given to survey respondents. Data science tools can be used to produce synthetic data that preserve critical moments of the empirical data but do not contain the records of any existing individual respondent or business. Developing public-use firm data from surveys presents unique challenges different from demographic data, because there is a lack of anonymity and certain industries can be easily identified in each geographic area. This paper briefly describes a machine learning model used to construct a synthetic PUMS based on the Annual Business Survey (ABS) and discusses various quality metrics. Although the ABS PUMS is currently being refined and results are confidential, we present two synthetic PUMS developed for the 2007 Survey of Business Owners, similar to the ABS business data. Econometric replication of a high impact analysis published in Small Business Economics demonstrates the verisimilitude of the synthetic data to the true data and motivates discussion of possible ABS use cases.

MLMar 27, 2025
Bayesian Pseudo Posterior Mechanism for Differentially Private Machine Learning

Robert Chew, Matthew R. Williams, Elan A. Segarra et al.

Differential privacy (DP) is becoming increasingly important for deployed machine learning applications because it provides strong guarantees for protecting the privacy of individuals whose data is used to train models. However, DP mechanisms commonly used in machine learning tend to struggle on many real world distributions, including highly imbalanced or small labeled training sets. In this work, we propose a new scalable DP mechanism for deep learning models, SWAG-PPM, by using a pseudo posterior distribution that downweights by-record likelihood contributions proportionally to their disclosure risks as the randomized mechanism. As a motivating example from official statistics, we demonstrate SWAG-PPM on a workplace injury text classification task using a highly imbalanced public dataset published by the U.S. Occupational Safety and Health Administration (OSHA). We find that SWAG-PPM exhibits only modest utility degradation against a non-private comparator while greatly outperforming the industry standard DP-SGD for a similar privacy budget.