A Bayesian Model of Multilingual Unsupervised Semantic Role Induction
This work addresses semantic role labeling for multilingual NLP applications, but it is incremental as it builds on existing Bayesian and unsupervised methods with limited gains from cross-lingual alignments.
The authors tackled unsupervised semantic role induction across multiple languages by proposing a Bayesian model that incorporates parallel corpora, finding that the primary benefit of adding parallel data was increased monolingual data volume, with cross-lingual alignments providing only minor improvements.
We propose a Bayesian model of unsupervised semantic role induction in multiple languages, and use it to explore the usefulness of parallel corpora for this task. Our joint Bayesian model consists of individual models for each language plus additional latent variables that capture alignments between roles across languages. Because it is a generative Bayesian model, we can do evaluations in a variety of scenarios just by varying the inference procedure, without changing the model, thereby comparing the scenarios directly. We compare using only monolingual data, using a parallel corpus, using a parallel corpus with annotations in the other language, and using small amounts of annotation in the target language. We find that the biggest impact of adding a parallel corpus to training is actually the increase in mono-lingual data, with the alignments to another language resulting in small improvements, even with labeled data for the other language.