Utilizing coarse-grained data in low-data settings for event extraction
This work addresses the annotation bottleneck for event extraction systems, offering a practical solution for researchers and practitioners, though it is incremental in nature.
The paper tackles the problem of expensive and error-prone annotation for event extraction by investigating the use of more feasible coarse-grained data, such as document or sentence labels, in low-data settings. Results show that integrating this data, even just negative documents, leads to improvements in performance and robustness.
Annotating text data for event information extraction systems is hard, expensive, and error-prone. We investigate the feasibility of integrating coarse-grained data (document or sentence labels), which is far more feasible to obtain, instead of annotating more documents. We utilize a multi-task model with two auxiliary tasks, document and sentence binary classification, in addition to the main task of token classification. We perform a series of experiments with varying data regimes for the aforementioned integration. Results show that while introducing extra coarse-grained data offers greater improvement and robustness, a gain is still possible with only the addition of negative documents that have no information on any event.