CYCLSep 29, 2025

Learning from Convenience Samples: A Case Study on Fine-Tuning LLMs for Survey Non-response in the German Longitudinal Election Study

arXiv:2509.25063v12 citationsh-index: 10Has Code
Originality Incremental advance
AI Analysis

This addresses survey researchers' challenges with missing data and costly probability samples, offering a practical method for inference with non-probability samples, though it is incremental in applying existing fine-tuning techniques to a specific domain.

The researchers tackled the problem of survey non-response by fine-tuning LLMs on convenience samples to impute missing vote choice data, finding that fine-tuned small LLMs (3B to 8B parameters) outperformed zero-shot approaches and often matched or exceeded tabular classifiers in accuracy for both random and systematic nonresponse scenarios.

Survey researchers face two key challenges: the rising costs of probability samples and missing data (e.g., non-response or attrition), which can undermine inference and increase the use of convenience samples. Recent work explores using large language models (LLMs) to simulate respondents via persona-based prompts, often without labeled data. We study a more practical setting where partial survey responses exist: we fine-tune LLMs on available data to impute self-reported vote choice under both random and systematic nonresponse, using the German Longitudinal Election Study. We compare zero-shot prompting and supervised fine-tuning against tabular classifiers (e.g., CatBoost) and test how different convenience samples (e.g., students) used for fine-tuning affect generalization. Our results show that when data are missing completely at random, fine-tuned LLMs match tabular classifiers but outperform zero-shot approaches. When only biased convenience samples are available, fine-tuning small (3B to 8B) open-source LLMs can recover both individual-level predictions and population-level distributions more accurately than zero-shot and often better than tabular methods. This suggests fine-tuned LLMs offer a promising strategy for researchers working with non-probability samples or systematic missingness, and may enable new survey designs requiring only easily accessible subpopulations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes