CLJul 13, 2018

Low-Resource Text Classification using Domain-Adversarial Learning

Daniel Grießhaber, Ngoc Thang Vu, Johannes Maucher

arXiv:1807.05195v20.52 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of limited annotated data in NLP for new domains or languages, though it appears incremental as it builds on existing domain-adversarial methods.

The paper tackles the problem of low-resource text classification by using domain-adversarial learning to train domain-invariant features, avoiding overfitting in deep neural networks, and shows that monolingual word vectors can achieve performance comparable to pretrained multilingual vectors without prealignment.

Deep learning techniques have recently shown to be successful in many natural language processing tasks forming state-of-the-art systems. They require, however, a large amount of annotated data which is often missing. This paper explores the use of domain-adversarial learning as a regularizer to avoid overfitting when training domain invariant features for deep, complex neural networks in low-resource and zero-resource settings in new target domains or languages. In case of new languages, we show that monolingual word vectors can be directly used for training without prealignment. Their projection into a common space can be learnt ad-hoc at training time reaching the final performance of pretrained multilingual word vectors.

View on arXiv PDF

Similar