CLLGOct 5, 2020

SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup

arXiv:2010.02322v11016 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses label efficiency for low-resource sequence labeling tasks, offering an incremental improvement over standard active learning methods.

The paper tackles the inefficiency of active learning in sequence labeling by proposing SeqMix, a data augmentation method that generates extra labeled sequences via mixup, improving F1 scores by 2.27% to 3.75% on Named Entity Recognition and Event Detection tasks.

Active learning is an important technique for low-resource sequence labeling tasks. However, current active sequence labeling methods use the queried samples alone in each iteration, which is an inefficient way of leveraging human annotations. We propose a simple but effective data augmentation method to improve the label efficiency of active sequence labeling. Our method, SeqMix, simply augments the queried samples by generating extra labeled sequences in each iteration. The key difficulty is to generate plausible sequences along with token-level labels. In SeqMix, we address this challenge by performing mixup for both sequences and token-level labels of the queried samples. Furthermore, we design a discriminator during sequence mixup, which judges whether the generated sequences are plausible or not. Our experiments on Named Entity Recognition and Event Detection tasks show that SeqMix can improve the standard active sequence labeling method by $2.27\%$--$3.75\%$ in terms of $F_1$ scores. The code and data for SeqMix can be found at https://github.com/rz-zhang/SeqMix

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes