CLAIIRFeb 1, 2025

DEUCE: Dual-diversity Enhancement and Uncertainty-awareness for Cold-start Active Learning

arXiv:2502.00305v121 citationsh-index: 20TACL
Originality Incremental advance
AI Analysis

This addresses the issue of biased data selection in label-scarce text classification for NLP practitioners, though it is incremental as it builds on existing active learning methods.

The paper tackles the problem of biased learning in cold-start active learning for text classification by proposing the DEUCE framework, which enhances dual-diversity and uncertainty-awareness to select class-balanced and hard representative instances, achieving superior performance on six NLP datasets.

Cold-start active learning (CSAL) selects valuable instances from an unlabeled dataset for manual annotation. It provides high-quality data at a low annotation cost for label-scarce text classification. However, existing CSAL methods overlook weak classes and hard representative examples, resulting in biased learning. To address these issues, this paper proposes a novel dual-diversity enhancing and uncertainty-aware (DEUCE) framework for CSAL. Specifically, DEUCE leverages a pretrained language model (PLM) to efficiently extract textual representations, class predictions, and predictive uncertainty. Then, it constructs a Dual-Neighbor Graph (DNG) to combine information on both textual diversity and class diversity, ensuring a balanced data distribution. It further propagates uncertainty information via density-based clustering to select hard representative instances. DEUCE performs well in selecting class-balanced and hard representative data by dual-diversity and informativeness. Experiments on six NLP datasets demonstrate the superiority and efficiency of DEUCE.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes