AIJan 11, 2025

Survey Transfer Learning: Recycling Data with Silicon Responses

arXiv:2501.06577v2

Originality Incremental advance

AI Analysis

This addresses challenges in social science and polling by providing a more efficient and accurate method for data generation, though it is incremental as it adapts transfer learning to a specific domain.

The paper tackles the problem of generating synthetic survey data by introducing Survey Transfer Learning (STL), which recycles existing survey data to produce silicon responses with accuracy rates up to 93%, outperforming LLMs on sensitive measures like racial resentment.

As researchers increasingly turn to large language models (LLMs) to generate synthetic survey data, less attention has been paid to alternative AI paradigms given environmental costs of LLMs. This paper introduces Survey Transfer Learning (STL), which develops transfer learning paradigms from computer science for survey research to recycle existing survey data and generate empirically grounded silicon responses. Inspired by political behavior theory, STL leverages shared demographic variables with high predictive power in a polarized American context to transfer knowledge across surveys. Using a neural network pre-trained on the Cooperative Election Study (CES) 2020, freezing early layers to preserve learned structure, and fine-tuning top layers on the American National Election Studies (ANES) 2020, STL generates silicon responses CES 2022 and in held-out ANES 2020 data with accuracy rates of up to 93 percent. Results show that STL outperforms LLMs, especially on sensitive measures such as racial resentment. While LLMs silicon samples are costly and opaque, STL generates empirically grounded silicon responses with high individual-level accuracy, potentially helping to mitigate key challenges in social science and the polling industry.

View on arXiv PDF

Similar