AISep 4, 2025

What Would an LLM Do? Evaluating Policymaking Capabilities of Large Language Models

arXiv:2509.03827v15 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses the problem of leveraging AI for complex social policymaking in high-stakes domains like homelessness, though it appears incremental as it builds on existing LLM capabilities with new benchmarks and simulations.

The paper evaluated whether large language models (LLMs) can align with domain experts to inform social policymaking for homelessness alleviation, affecting over 150 million people, by developing a novel benchmark across four geographies and showing promising potential for LLMs to provide alternative policies at scale with responsible guardrails.

Large language models (LLMs) are increasingly being adopted in high-stakes domains. Their capacity to process vast amounts of unstructured data, explore flexible scenarios, and handle a diversity of contextual factors can make them uniquely suited to provide new insights for the complexity of social policymaking. This article evaluates whether LLMs' are aligned with domain experts (and among themselves) to inform social policymaking on the subject of homelessness alleviation - a challenge affecting over 150 million people worldwide. We develop a novel benchmark comprised of decision scenarios with policy choices across four geographies (South Bend, USA; Barcelona, Spain; Johannesburg, South Africa; Macau SAR, China). The policies in scope are grounded in the conceptual framework of the Capability Approach for human development. We also present an automated pipeline that connects the benchmarked policies to an agent-based model, and we explore the social impact of the recommended policies through simulated social scenarios. The paper results reveal promising potential to leverage LLMs for social policy making. If responsible guardrails and contextual calibrations are introduced in collaboration with local domain experts, LLMs can provide humans with valuable insights, in the form of alternative policies at scale.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes