Syntactic and Semantic-driven Learning for Open Information Extraction
This addresses the problem of data scarcity for researchers and practitioners in open information extraction, offering a novel unsupervised method that is not incremental but introduces a new paradigm.
The paper tackles the bottleneck of needing large labeled corpora for neural open information extraction by proposing a syntactic and semantic-driven learning approach that trains models without human-labeled data, achieving competitive performance with supervised state-of-the-art models.
One of the biggest bottlenecks in building accurate, high coverage neural open IE systems is the need for large labelled corpora. The diversity of open domain corpora and the variety of natural language expressions further exacerbate this problem. In this paper, we propose a syntactic and semantic-driven learning approach, which can learn neural open IE models without any human-labelled data by leveraging syntactic and semantic knowledge as noisier, higher-level supervisions. Specifically, we first employ syntactic patterns as data labelling functions and pretrain a base model using the generated labels. Then we propose a syntactic and semantic-driven reinforcement learning algorithm, which can effectively generalize the base model to open situations with high accuracy. Experimental results show that our approach significantly outperforms the supervised counterparts, and can even achieve competitive performance to supervised state-of-the-art (SoA) model