Learning Frames from Text with an Unsupervised Latent Variable Model
This work addresses the challenge of automatically extracting event structures from corpora for natural language processing applications, but it is incremental as it builds on existing probabilistic methods.
The authors tackled the problem of discovering semantic frames from text using an unsupervised latent variable model, resulting in a model that learns some novel frames compared to FrameNet.
We develop a probabilistic latent-variable model to discover semantic frames---types of events and their participants---from corpora. We present a Dirichlet-multinomial model in which frames are latent categories that explain the linking of verb-subject-object triples, given document-level sparsity. We analyze what the model learns, and compare it to FrameNet, noting it learns some novel and interesting frames. This document also contains a discussion of inference issues, including concentration parameter learning; and a small-scale error analysis of syntactic parsing accuracy.