John M. Anderies

2papers

2 Papers

17.8CYApr 19
All Public Voices Are Equal, But Are Some More Equal Than Others to LLMs?

Sola Kim, Marco A. Janssen, Jieshu Wang et al.

Federal agencies are increasingly deploying large language models (LLMs) to process public comments submitted during notice-and-comment rulemaking, the primary mechanism through which citizens influence federal regulation. Whether these systems treat all public input equally remains largely untested. Using a counterfactual design, we held comment content constant and varied only the commenter's demographic attribution -- race, gender, and socioeconomic status -- to test whether eight LLMs available for federal use produce differential summaries of identical comments. We processed 182 public comments across 32 identity conditions, generating over 106,000 summaries. Occupation was the only identity signal to produce consistent differential treatment: the same comment attributed to a street vendor, compared to a financial analyst, received a summary that preserved less of the original meaning, used simpler language, and shifted emotional tone. This pattern held across all names, prompts, models, and regulatory contexts tested. Race effects were inconsistent and appeared driven by specific name tokens rather than racial categories; gender effects were absent. Writing quality predicted summarization outcomes through argument substance rather than surface mechanics; experimentally injected spelling and grammar errors had negligible effects. The magnitude of occupation-based differential treatment varied by model provider, meaning that selecting a model implicitly selects a level of fairness -- a dimension that current procurement frameworks such as FedRAMP do not evaluate. These findings suggest that socioeconomic signals warrant attention in AI fairness assessments for government information systems, and that fairness benchmarks could be incorporated into existing federal IT procurement processes.

CYDec 29, 2025
How Large Language Models Systematically Misrepresent American Climate Opinions

Sola Kim, Jieshu Wang, Marco A. Janssen et al.

Federal agencies and researchers increasingly use large language models to analyze and simulate public opinion. When AI mediates between the public and policymakers, accuracy across intersecting identities becomes consequential; inaccurate group-level estimates can mislead outreach, consultation, and policy design. While research examines intersectionality in LLM outputs, no study has compared these outputs against real human responses across intersecting identities. Climate policy is one such domain, and this is particularly urgent for climate change, where opinion is contested and diverse. We investigate how LLMs represent intersectional patterns in U.S. climate opinions. We prompted six LLMs with profiles of 978 respondents from a nationally representative U.S. climate opinion survey and compared AI-generated responses to actual human answers across 20 questions. We find that LLMs appear to compress the diversity of American climate opinions, predicting less-concerned groups as more concerned and vice versa. This compression is intersectional: LLMs apply uniform gender assumptions that match reality for White and Hispanic Americans but misrepresent Black Americans, where actual gender patterns differ. These patterns, which may be invisible to standard auditing approaches, could undermine equitable climate governance.