AIAPApr 19

Rectification Difficulty and Optimal Sample Allocation in LLM-Augmented Surveys

arXiv:2604.1726778.6h-index: 15
Predicted impact top 27% in AI · last 90 daysOriginality Incremental advance
AI Analysis

For survey researchers and practitioners, this work provides a principled method to combine LLM and human responses, reducing cost while maintaining accuracy.

The paper addresses the problem of optimally allocating a fixed budget of human respondents across survey questions when cheap but unreliable LLM predictions are available. It proposes a framework that combines Prediction-Powered Inference, a closed-form optimal allocation rule, and a meta-learning approach to predict rectification difficulty, achieving 11.4% and 10.5% MSE reductions without pilot human data.

Large Language Models can generate synthetic survey responses at low cost, but their accuracy varies unpredictably across questions. We study the design problem of allocating a fixed budget of human respondents across estimation tasks when cheap LLM predictions are available for every task. Our framework combines three components. First, building on Prediction-Powered Inference, we characterize a question-specific rectification difficulty that governs how quickly the estimator's variance decreases with human sample size. Second, we derive a closed-form optimal allocation rule that directs more human labels to tasks where the LLM is least reliable. Third, since rectification difficulty depends on unobserved human responses for new surveys, we propose a meta-learning approach, trained on historical data, that predicts it for entirely new tasks without pilot data. The framework extends to general M-estimation, covering regression coefficients and multinomial logit partworths for conjoint analysis. We validate the framework on two datasets spanning different domains, question types, and LLMs, showing that our approach captures 61-79% of the theoretically attainable efficiency gains, achieving 11.4% and 10.5% MSE reductions without requiring any pilot human data for the target survey.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes