LGCLMLJul 2, 2018

Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data

arXiv:1807.00745v21113 citations
AI Analysis

This addresses the challenge of limited labeled data for low-resource languages or domains, though it is incremental as it builds on existing noise-handling methods.

The paper tackles the problem of training neural networks with automatically annotated noisy data in low-resource settings, showing a 35% performance improvement in a low-resource NER task by using a noise layer to model and handle noise.

Manually labeled corpora are expensive to create and often not available for low-resource languages or domains. Automatic labeling approaches are an alternative way to obtain labeled data in a quicker and cheaper way. However, these labels often contain more errors which can deteriorate a classifier's performance when trained on this data. We propose a noise layer that is added to a neural network architecture. This allows modeling the noise and train on a combination of clean and noisy data. We show that in a low-resource NER task we can improve performance by up to 35% by using additional, noisy data and handling the noise.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes