CL AIApr 3

Domain-Adapted Retrieval for In-Context Annotation of Pedagogical Dialogue Acts

Jinsook Lee, Kirk Vanacore, Zhuqian Zhou, Bakhtawar Ahtisham, Rene F. Kizilcec

arXiv:2604.0312780.6

Predicted impact top 68% in CL · last 90 daysOriginality Incremental advance

AI Analysis

This addresses the high-stakes task of pedagogical dialogue annotation for education and AI tutoring systems, offering a practical incremental improvement by adapting retrieval while keeping the generative model frozen.

The paper tackled the problem of automated annotation of pedagogical dialogue acts, where LLMs often fail without domain grounding, by presenting a domain-adapted RAG pipeline that adapts retrieval through fine-tuning an embedding model and utterance-level indexing, achieving Cohen's κ of 0.526-0.580 on TalkMoves and 0.659-0.743 on Eedi, substantially outperforming baselines.

Automated annotation of pedagogical dialogue is a high-stakes task where LLMs often fail without sufficient domain grounding. We present a domain-adapted RAG pipeline for tutoring move annotation. Rather than fine-tuning the generative model, we adapt retrieval by fine-tuning a lightweight embedding model on tutoring corpora and indexing dialogues at the utterance level to retrieve labeled few-shot demonstrations. Evaluated across two real tutoring dialogue datasets (TalkMoves and Eedi) and three LLM backbones (GPT-5.2, Claude Sonnet 4.6, Qwen3-32b), our best configuration achieves Cohen's $Îº$ of 0.526-0.580 on TalkMoves and 0.659-0.743 on Eedi, substantially outperforming no-retrieval baselines ($Îº= 0.275$-$0.413$ and $0.160$-$0.410$). An ablation study reveals that utterance-level indexing, rather than embedding quality alone, is the primary driver of these gains, with top-1 label match rates improving from 39.7\% to 62.0\% on TalkMoves and 52.9\% to 73.1\% on Eedi under domain-adapted retrieval. Retrieval also corrects systematic label biases present in zero-shot prompting and yields the largest improvements for rare and context-dependent labels. These findings suggest that adapting the retrieval component alone is a practical and effective path toward expert-level pedagogical dialogue annotation while keeping the generative model frozen.

View on arXiv PDF

Similar