CLMar 6

Wisdom of the AI Crowd (AI-CROWD) for Ground Truth Approximation in Content Analysis: A Research Protocol & Validation Using Eleven Large Language Models

Luis de-Marcos, Manuel Goyanes, Adrián Domínguez-Díaz

arXiv:2603.06197v16.4h-index: 20

Predicted impact top 93% in CL · last 90 daysOriginality Incremental advance

AI Analysis

This addresses the high cost and inconsistency of human coding for massive datasets, though it is incremental as it builds on existing ensemble methods with LLMs.

The paper tackles the problem of missing ground truth labels in large-scale content analysis by introducing the AI-CROWD protocol, which approximates ground truth using an ensemble of eleven large language models to generate consensus-based labels through majority voting and diagnostic metrics.

Large-scale content analysis is increasingly limited by the absence of observable ground truth or gold-standard labels, as creating such benchmarks through extensive human coding becomes impractical for massive datasets due to high time, cost, and consistency challenges. To overcome this barrier, we introduce the AI-CROWD protocol, which approximates ground truth by leveraging the collective outputs of an ensemble of large language models (LLMs). Rather than asserting that the resulting labels are true ground truth, the protocol generates a consensus-based approximation derived from convergent and divergent inferences across multiple models. By aggregating outputs via majority voting and interrogating agreement/disagreement patterns with diagnostic metrics, AI-CROWD identifies high-confidence classifications while flagging potential ambiguity or model-specific biases.

View on arXiv PDF

Similar