Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder
This addresses the challenge of limited labeled data for syntactic parsing across multiple languages, offering a method to leverage unlabeled texts beyond word embeddings.
The paper tackles the problem of expensive human annotation for syntactic parsing by proposing a novel latent-variable generative model for semi-supervised dependency parsing, achieving improved performance on English, French, and Swedish datasets.
Human annotation for syntactic parsing is expensive, and large resources are available only for a fraction of languages. A question we ask is whether one can leverage abundant unlabeled texts to improve syntactic parsers, beyond just using the texts to obtain more generalisable lexical features (i.e. beyond word embeddings). To this end, we propose a novel latent-variable generative model for semi-supervised syntactic dependency parsing. As exact inference is intractable, we introduce a differentiable relaxation to obtain approximate samples and compute gradients with respect to the parser parameters. Our method (Differentiable Perturb-and-Parse) relies on differentiable dynamic programming over stochastically perturbed edge scores. We demonstrate effectiveness of our approach with experiments on English, French and Swedish.