QSTN: A Modular Framework for Robust Questionnaire Inference with Large Language Models
This provides researchers with a tool to improve reproducibility and reliability in LLM-based survey and annotation tasks, though it is an incremental framework improvement rather than a fundamental breakthrough.
The researchers tackled the problem of unreliable questionnaire inference with large language models by developing QSTN, a modular framework that enables systematic evaluation of question presentation and response generation methods, showing through over 40 million survey responses that these factors significantly impact alignment with human answers while reducing computational costs.
We introduce QSTN, an open-source Python framework for systematically generating responses from questionnaire-style prompts to support in-silico surveys and annotation tasks with large language models (LLMs). QSTN enables robust evaluation of questionnaire presentation, prompt perturbations, and response generation methods. Our extensive evaluation ($>40 $ million survey responses) shows that question structure and response generation methods have a significant impact on the alignment of generated survey responses with human answers, and can be obtained for a fraction of the compute cost. In addition, we offer a no-code user interface that allows researchers to set up robust experiments with LLMs without coding knowledge. We hope that QSTN will support the reproducibility and reliability of LLM-based research in the future.