CLLGSep 9, 2021

Cartography Active Learning

arXiv:2109.04282v3672 citations
Originality Incremental advance
AI Analysis

This work addresses data efficiency in active learning for text classification, though it appears incremental as it builds on existing data map concepts.

The authors tackled the problem of selecting informative instances for labeling in active learning by proposing Cartography Active Learning (CAL), which uses training dynamics as a proxy, and demonstrated that it is competitive with common methods and achieves comparable or better results with less data.

We propose Cartography Active Learning (CAL), a novel Active Learning (AL) algorithm that exploits the behavior of the model on individual instances during training as a proxy to find the most informative instances for labeling. CAL is inspired by data maps, which were recently proposed to derive insights into dataset quality (Swayamdipta et al., 2020). We compare our method on popular text classification tasks to commonly used AL strategies, which instead rely on post-training behavior. We demonstrate that CAL is competitive to other common AL methods, showing that training dynamics derived from small seed data can be successfully used for AL. We provide insights into our new AL method by analyzing batch-level statistics utilizing the data maps. Our results further show that CAL results in a more data-efficient learning strategy, achieving comparable or better results with considerably less training data.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes