CL LGOct 5, 2020

SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup

arXiv:2010.02322v131.51016 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses label efficiency for low-resource sequence labeling tasks, offering an incremental improvement over standard active learning methods.

The paper tackles the inefficiency of active learning in sequence labeling by proposing SeqMix, a data augmentation method that generates extra labeled sequences via mixup, improving F1 scores by 2.27% to 3.75% on Named Entity Recognition and Event Detection tasks.

Active learning is an important technique for low-resource sequence labeling tasks. However, current active sequence labeling methods use the queried samples alone in each iteration, which is an inefficient way of leveraging human annotations. We propose a simple but effective data augmentation method to improve the label efficiency of active sequence labeling. Our method, SeqMix, simply augments the queried samples by generating extra labeled sequences in each iteration. The key difficulty is to generate plausible sequences along with token-level labels. In SeqMix, we address this challenge by performing mixup for both sequences and token-level labels of the queried samples. Furthermore, we design a discriminator during sequence mixup, which judges whether the generated sequences are plausible or not. Our experiments on Named Entity Recognition and Event Detection tasks show that SeqMix can improve the standard active sequence labeling method by $2.27\%$--$3.75\%$ in terms of $F_1$ scores. The code and data for SeqMix can be found at https://github.com/rz-zhang/SeqMix

View on arXiv PDF Code

Similar