CLOct 22, 2020

An Analysis of Simple Data Augmentation for Named Entity Recognition

arXiv:2010.11683v131.71023 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses data scarcity in domain-specific NER tasks, but it is incremental as it adapts existing augmentation methods to a new problem.

The paper tackled the problem of improving named entity recognition (NER) performance, particularly for small training sets, by applying simple data augmentation techniques, and found that it boosted performance for both recurrent and transformer-based models on biomedical and materials science datasets.

Simple yet effective data augmentation techniques have been proposed for sentence-level and sentence-pair natural language processing tasks. Inspired by these efforts, we design and compare data augmentation for named entity recognition, which is usually modeled as a token-level sequence labeling problem. Through experiments on two data sets from the biomedical and materials science domains (i2b2-2010 and MaSciP), we show that simple augmentation can boost performance for both recurrent and transformer-based models, especially for small training sets.

View on arXiv PDF Code

Similar