MLLGJun 14, 2012

Identifiability and Unmixing of Latent Parse Trees

arXiv:1206.3137v141 citations
Originality Incremental advance
AI Analysis

This work tackles the problem of unsupervised parsing for computational linguistics, offering incremental improvements in identifiability and estimation for specific model classes.

The paper addresses unsupervised parsing by determining which models are identifiable from infinite data using a Jacobian-based technique and proposes an efficient parameter estimation method called unmixing for identifiable models, overcoming limitations of EM and spectral methods due to varying parse tree topologies.

This paper explores unsupervised learning of parsing models along two directions. First, which models are identifiable from infinite data? We use a general technique for numerically checking identifiability based on the rank of a Jacobian matrix, and apply it to several standard constituency and dependency parsing models. Second, for identifiable models, how do we estimate the parameters efficiently? EM suffers from local optima, while recent work using spectral methods cannot be directly applied since the topology of the parse tree varies across sentences. We develop a strategy, unmixing, which deals with this additional complexity for restricted classes of parsing models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes