CLJul 7, 2025

Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification

ETH Zurich
arXiv:2507.05010v12 citationsh-index: 40EMNLP
Originality Incremental advance
AI Analysis

This addresses the challenge of improving text classification accuracy for domain experts by incrementally refining annotation rules through collaboration with LLMs.

The paper tackles the problem of identifying edge cases in text classification by introducing Co-DETECT, a mixed-initiative framework that combines human expertise with LLM-guided annotation to iteratively improve codebooks, resulting in more effective handling of nuanced phenomena as proven by user studies and analyses.

We introduce Co-DETECT (Collaborative Discovery of Edge cases in TExt ClassificaTion), a novel mixed-initiative annotation framework that integrates human expertise with automatic annotation guided by large language models (LLMs). Co-DETECT starts with an initial, sketch-level codebook and dataset provided by a domain expert, then leverages the LLM to annotate the data and identify edge cases that are not well described by the initial codebook. Specifically, Co-DETECT flags challenging examples, induces high-level, generalizable descriptions of edge cases, and assists user in incorporating edge case handling rules to improve the codebook. This iterative process enables more effective handling of nuanced phenomena through compact, generalizable annotation rules. Extensive user study, qualitative and quantitative analyses prove the effectiveness of Co-DETECT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes