Next Generation Active Learning: Mixture of LLMs in the Loop
This addresses the challenge of reducing annotation costs in machine learning applications by improving LLM-based labeling, though it is incremental as it builds on existing active learning and ensemble methods.
The paper tackles the problem of low-quality labels from large language models (LLMs) in active learning by proposing a Mixture-of-LLMs framework that aggregates multiple LLMs to enhance annotation robustness, achieving performance comparable to human annotation and outperforming single-LLM and other ensemble baselines.
With the rapid advancement and strong generalization capabilities of large language models (LLMs), they have been increasingly incorporated into the active learning pipelines as annotators to reduce annotation costs. However, considering the annotation quality, labels generated by LLMs often fall short of real-world applicability. To address this, we propose a novel active learning framework, Mixture of LLMs in the Loop Active Learning, replacing human annotators with labels generated through a Mixture-of-LLMs-based annotation model, aimed at enhancing LLM-based annotation robustness by aggregating the strengths of multiple LLMs. To further mitigate the impact of the noisy labels, we introduce annotation discrepancy and negative learning to identify the unreliable annotations and enhance learning effectiveness. Extensive experiments demonstrate that our framework achieves performance comparable to human annotation and consistently outperforms single-LLM baselines and other LLM-ensemble-based approaches. Moreover, our framework is built on lightweight LLMs, enabling it to operate fully on local machines in real-world applications.