CLJun 30, 2025

Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning

Seungjun Yi, Joakim Nguyen, Huimin Xu, Terence Lim, Andrew Well, Mia Markey, Ying Ding

arXiv:2506.23998v213.09 citationsh-index: 5

Originality Incremental advance

AI Analysis

This work addresses the challenge of analyzing large qualitative datasets for patient and caregiver experiences in clinical settings, offering a scalable solution that is incremental in automating existing thematic analysis methods.

The authors tackled the problem of labor-intensive and unscalable manual thematic analysis for clinical narratives in congenital heart disease by proposing a fully automated large language model pipeline, achieving scalable analysis with enhanced theme quality through a multi-agent framework and optional reinforcement learning.

Congenital heart disease (CHD) presents complex, lifelong challenges often underrepresented in traditional clinical metrics. While unstructured narratives offer rich insights into patient and caregiver experiences, manual thematic analysis (TA) remains labor-intensive and unscalable. We propose a fully automated large language model (LLM) pipeline that performs end-to-end TA on clinical narratives, which eliminates the need for manual coding or full transcript review. Our system employs a novel multi-agent framework, where specialized LLM agents assume roles to enhance theme quality and alignment with human analysis. To further improve thematic relevance, we optionally integrate reinforcement learning from human feedback (RLHF). This supports scalable, patient-centered analysis of large qualitative datasets and allows LLMs to be fine-tuned for specific clinical contexts.

View on arXiv PDF

Similar