CL HC LGApr 2, 2025

Processes Matter: How ML/GAI Approaches Could Support Open Qualitative Coding of Online Discourse Datasets

John Chen, Alexandros Lotsos, Grace Wang, Lexie Zhao, Bruce Sherin, Uri Wilensky, Michael Horn

arXiv:2504.02887v16.73 citationsh-index: 5Proceedings of the International Conference on Computer-supported for Collaborative Learning

Originality Incremental advance

AI Analysis

This work addresses the challenge of scaling open coding for large discourse datasets, offering insights for qualitative researchers, but it is incremental as it builds on existing ML/GAI explorations without major breakthroughs.

The study compared open coding results from five ML/GAI approaches and four human coders on an online chat dataset, finding that AI effectively identifies content-based codes while humans excel in interpreting conversational dynamics, highlighting complementary potential.

Open coding, a key inductive step in qualitative research, discovers and constructs concepts from human datasets. However, capturing extensive and nuanced aspects or "coding moments" can be challenging, especially with large discourse datasets. While some studies explore machine learning (ML)/Generative AI (GAI)'s potential for open coding, few evaluation studies exist. We compare open coding results by five recently published ML/GAI approaches and four human coders, using a dataset of online chat messages around a mobile learning software. Our systematic analysis reveals ML/GAI approaches' strengths and weaknesses, uncovering the complementary potential between humans and AI. Line-by-line AI approaches effectively identify content-based codes, while humans excel in interpreting conversational dynamics. We discussed how embedded analytical processes could shape the results of ML/GAI approaches. Instead of replacing humans in open coding, researchers should integrate AI with and according to their analytical processes, e.g., as parallel co-coders.

View on arXiv PDF

Similar