CLFeb 9, 2021

Bootstrapping Relation Extractors using Syntactic Search by Examples

arXiv:2102.05007v1801 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of obtaining sufficient training data for supervised relation extraction, making it easier for non-NLP experts to create datasets.

This paper proposes a method for bootstrapping training datasets for relation extraction using syntactic search engines. The resulting models are competitive with those trained on manually annotated data and distant supervision, and outperform models trained using NLG data augmentation.

The advent of neural-networks in NLP brought with it substantial improvements in supervised relation extraction. However, obtaining a sufficient quantity of training data remains a key challenge. In this work we propose a process for bootstrapping training datasets which can be performed quickly by non-NLP-experts. We take advantage of search engines over syntactic-graphs (Such as Shlain et al. (2020)) which expose a friendly by-example syntax. We use these to obtain positive examples by searching for sentences that are syntactically similar to user input examples. We apply this technique to relations from TACRED and DocRED and show that the resulting models are competitive with models trained on manually annotated data and on data obtained from distant supervision. The models also outperform models trained using NLG data augmentation techniques. Extending the search-based approach with the NLG method further improves the results.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes