Semi-Supervised Event Extraction with Paraphrase Clusters
This addresses the data scarcity issue in event extraction for NLP applications, though it is incremental as it builds on existing self-training and clustering techniques.
The paper tackles the problem of limited training data for supervised event extraction by introducing a self-training method that bootstraps additional data from clusters of event mentions across news articles, resulting in significant performance improvements on ACE 2005 and TAC-KBP 2015 datasets.
Supervised event extraction systems are limited in their accuracy due to the lack of available training data. We present a method for self-training event extraction systems by bootstrapping additional training data. This is done by taking advantage of the occurrence of multiple mentions of the same event instances across newswire articles from multiple sources. If our system can make a highconfidence extraction of some mentions in such a cluster, it can then acquire diverse training examples by adding the other mentions as well. Our experiments show significant performance improvements on multiple event extractors over ACE 2005 and TAC-KBP 2015 datasets.