GenAI-Powered Inference
This provides a computationally efficient and accessible method for researchers and practitioners needing to analyze unstructured data for causal and predictive tasks, though it appears incremental as it builds on existing generative models without fundamental new paradigms.
The paper tackles the problem of causal and predictive inference using unstructured data like text and images by introducing GenAI-Powered Inference (GPI), a framework that leverages open-source generative AI models to extract structured representations without fine-tuning, enabling estimation with uncertainty quantification in applications such as social media censorship analysis and electoral outcome prediction.
We introduce GenAI-Powered Inference (GPI), a statistical framework for both causal and predictive inference using unstructured data, including text and images. GPI leverages open-source Generative Artificial Intelligence (GenAI) models -- such as large language models and diffusion models -- not only to generate unstructured data at scale but also to extract low-dimensional representations that are guaranteed to capture their underlying structure. Applying machine learning to these representations, GPI enables estimation of causal and predictive effects while quantifying associated estimation uncertainty. Unlike existing approaches to representation learning, GPI does not require fine-tuning of generative models, making it computationally efficient and broadly accessible. We illustrate the versatility of the GPI framework through three applications: (1) analyzing Chinese social media censorship, (2) estimating predictive effects of candidates' facial appearance on electoral outcomes, and (3) assessing the persuasiveness of political rhetoric. An open-source software package is available for implementing GPI.