CLJul 13, 2018

Low-Resource Text Classification using Domain-Adversarial Learning

arXiv:1807.05195v22 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of limited annotated data in NLP for new domains or languages, though it appears incremental as it builds on existing domain-adversarial methods.

The paper tackles the problem of low-resource text classification by using domain-adversarial learning to train domain-invariant features, avoiding overfitting in deep neural networks, and shows that monolingual word vectors can achieve performance comparable to pretrained multilingual vectors without prealignment.

Deep learning techniques have recently shown to be successful in many natural language processing tasks forming state-of-the-art systems. They require, however, a large amount of annotated data which is often missing. This paper explores the use of domain-adversarial learning as a regularizer to avoid overfitting when training domain invariant features for deep, complex neural networks in low-resource and zero-resource settings in new target domains or languages. In case of new languages, we show that monolingual word vectors can be directly used for training without prealignment. Their projection into a common space can be learnt ad-hoc at training time reaching the final performance of pretrained multilingual word vectors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes