CLSep 22, 2018

Semi-Supervised Sequence Modeling with Cross-View Training

arXiv:1809.08370v11281 citations
Originality Incremental advance
AI Analysis

This addresses the need for more efficient NLP models by enhancing representation learning with semi-supervised techniques, though it is incremental as it builds on existing methods like Bi-LSTM.

The paper tackled the problem of supervised models not leveraging unlabeled data by proposing Cross-View Training (CVT), a semi-supervised algorithm that improves Bi-LSTM encoder representations using both labeled and unlabeled data, achieving state-of-the-art results on five sequence tagging tasks, machine translation, and dependency parsing.

Unsupervised representation learning algorithms such as word2vec and ELMo improve the accuracy of many supervised NLP models, mainly because they can take advantage of large amounts of unlabeled text. However, the supervised models only learn from task-specific labeled data during the main training phase. We therefore propose Cross-View Training (CVT), a semi-supervised learning algorithm that improves the representations of a Bi-LSTM sentence encoder using a mix of labeled and unlabeled data. On labeled examples, standard supervised learning is used. On unlabeled examples, CVT teaches auxiliary prediction modules that see restricted views of the input (e.g., only part of a sentence) to match the predictions of the full model seeing the whole input. Since the auxiliary modules and the full model share intermediate representations, this in turn improves the full model. Moreover, we show that CVT is particularly effective when combined with multi-task learning. We evaluate CVT on five sequence tagging tasks, machine translation, and dependency parsing, achieving state-of-the-art results.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes