Label Curation Using Agentic AI
This addresses the costly, slow, and variable nature of human-centric data annotation pipelines for machine learning applications, though it appears incremental as it adapts a classical probabilistic model with agentic coordination.
The paper tackles the problem of producing accurate, scalable labels for supervised learning by introducing AURA, an agentic AI framework that coordinates multiple AI agents to generate and validate labels without ground truth, achieving accuracy improvements of up to 5.8% on benchmark datasets and up to 50% in challenging settings with poor quality annotators.
Data annotation is essential for supervised learning, yet producing accurate, unbiased, and scalable labels remains challenging as datasets grow in size and modality. Traditional human-centric pipelines are costly, slow, and prone to annotator variability, motivating reliability-aware automated annotation. We present AURA (Agentic AI for Unified Reliability Modeling and Annotation Aggregation), an agentic AI framework for large-scale, multi-modal data annotation. AURA coordinates multiple AI agents to generate and validate labels without requiring ground truth. At its core, AURA adapts a classical probabilistic model that jointly infers latent true labels and annotator reliability via confusion matrices, using Expectation-Maximization to reconcile conflicting annotations and aggregate noisy predictions. Across the four benchmark datasets evaluated, AURA achieves accuracy improvements of up to 5.8% over baseline. In more challenging settings with poor quality annotators, the improvement is up to 50% over baseline. AURA also accurately estimates the reliability of annotators, allowing assessment of annotator quality even without any pre-validation steps.