CLOct 22, 2020

An Analysis of Simple Data Augmentation for Named Entity Recognition

arXiv:2010.11683v11023 citations
Originality Synthesis-oriented
AI Analysis

This work addresses data scarcity in domain-specific NER tasks, but it is incremental as it adapts existing augmentation methods to a new problem.

The paper tackled the problem of improving named entity recognition (NER) performance, particularly for small training sets, by applying simple data augmentation techniques, and found that it boosted performance for both recurrent and transformer-based models on biomedical and materials science datasets.

Simple yet effective data augmentation techniques have been proposed for sentence-level and sentence-pair natural language processing tasks. Inspired by these efforts, we design and compare data augmentation for named entity recognition, which is usually modeled as a token-level sequence labeling problem. Through experiments on two data sets from the biomedical and materials science domains (i2b2-2010 and MaSciP), we show that simple augmentation can boost performance for both recurrent and transformer-based models, especially for small training sets.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes