AI SIOct 28, 2025

Generative Large Language Models (gLLMs) in Content Analysis: A Practical Guide for Communication Research

Daria Kravets-Meinke, Hannah Schmid-Petri, Sonja Niemann, Ute Schmid

arXiv:2510.24337v11 citationsh-index: 11

Originality Synthesis-oriented

AI Analysis

It offers a practical guide for communication researchers to integrate gLLMs into their methodological toolkit, making automated content analysis more accessible and reliable, though it is incremental as it synthesizes existing research into a framework.

The paper addresses the use of generative large language models (gLLMs) for content analysis in communication research, showing they can outperform human coders in speed and cost while handling implicit meanings, and provides a best-practice guide to tackle challenges like prompt engineering and validation.

Generative Large Language Models (gLLMs), such as ChatGPT, are increasingly being used in communication research for content analysis. Studies show that gLLMs can outperform both crowd workers and trained coders, such as research assistants, on various coding tasks relevant to communication science, often at a fraction of the time and cost. Additionally, gLLMs can decode implicit meanings and contextual information, be instructed using natural language, deployed with only basic programming skills, and require little to no annotated data beyond a validation dataset - constituting a paradigm shift in automated content analysis. Despite their potential, the integration of gLLMs into the methodological toolkit of communication research remains underdeveloped. In gLLM-assisted quantitative content analysis, researchers must address at least seven critical challenges that impact result quality: (1) codebook development, (2) prompt engineering, (3) model selection, (4) parameter tuning, (5) iterative refinement, (6) validation of the model's reliability, and optionally, (7) performance enhancement. This paper synthesizes emerging research on gLLM-assisted quantitative content analysis and proposes a comprehensive best-practice guide to navigate these challenges. Our goal is to make gLLM-based content analysis more accessible to a broader range of communication researchers and ensure adherence to established disciplinary quality standards of validity, reliability, reproducibility, and research ethics.

View on arXiv PDF

Similar