CLLGAug 15, 2021

Deep Active Learning for Text Classification with Diverse Interpretations

arXiv:2108.10687v116 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of reducing annotation costs in text classification for machine learning practitioners, representing an incremental improvement in active learning methods.

The paper tackles the challenge of measuring sample informativeness in deep active learning for text classification by proposing ALDEN, which uses diverse local interpretations to select samples, resulting in consistent outperformance over state-of-the-art methods.

Recently, Deep Neural Networks (DNNs) have made remarkable progress for text classification, which, however, still require a large number of labeled data. To train high-performing models with the minimal annotation cost, active learning is proposed to select and label the most informative samples, yet it is still challenging to measure informativeness of samples used in DNNs. In this paper, inspired by piece-wise linear interpretability of DNNs, we propose a novel Active Learning with DivErse iNterpretations (ALDEN) approach. With local interpretations in DNNs, ALDEN identifies linearly separable regions of samples. Then, it selects samples according to their diversity of local interpretations and queries their labels. To tackle the text classification problem, we choose the word with the most diverse interpretations to represent the whole sentence. Extensive experiments demonstrate that ALDEN consistently outperforms several state-of-the-art deep active learning methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes