CLJan 22, 2019

Delta-training: Simple Semi-Supervised Text Classification using Pretrained Word Embeddings

arXiv:1901.07651v31001 citations
Originality Incremental advance
AI Analysis

This addresses text classification with limited labeled data, but it is incremental as it builds on existing self-training frameworks.

The paper tackles semi-supervised text classification by proposing Delta-training, a method that uses two classifiers with different word embeddings to focus on prediction differences on unlabeled data, and it outperforms self-training and co-training on four datasets.

We propose a novel and simple method for semi-supervised text classification. The method stems from the hypothesis that a classifier with pretrained word embeddings always outperforms the same classifier with randomly initialized word embeddings, as empirically observed in NLP tasks. Our method first builds two sets of classifiers as a form of model ensemble, and then initializes their word embeddings differently: one using random, the other using pretrained word embeddings. We focus on different predictions between the two classifiers on unlabeled data while following the self-training framework. We also use early-stopping in meta-epoch to improve the performance of our method. Our method, Delta-training, outperforms the self-training and the co-training framework in 4 different text classification datasets, showing robustness against error accumulation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes