CLApr 3, 2019

Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Autoencoders

arXiv:1904.02142v2128 citations
AI Analysis

This addresses the problem of unsupervised syntax discovery for natural language processing, offering a novel approach that is not incremental but sets a new benchmark.

The paper tackled unsupervised binary constituency parsing by introducing DIORA, a method that learns syntax and constituent representations without supervision, achieving new state-of-the-art F1 scores on WSJ and MultiNLI datasets.

We introduce deep inside-outside recursive autoencoders (DIORA), a fully-unsupervised method for discovering syntax that simultaneously learns representations for constituents within the induced tree. Our approach predicts each word in an input sentence conditioned on the rest of the sentence and uses inside-outside dynamic programming to consider all possible binary trees over the sentence. At test time the CKY algorithm extracts the highest scoring parse. DIORA achieves a new state-of-the-art F1 in unsupervised binary constituency parsing (unlabeled) in two benchmark datasets, WSJ and MultiNLI.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes