CLAILGJan 17, 2023

Syntactically Robust Training on Partially-Observed Data for Open Information Extraction

Tsinghua
arXiv:2301.06841v1295 citationsh-index: 30Has Code
Originality Incremental advance
AI Analysis

This work addresses a fundamental challenge in Open Information Extraction for NLP applications, but it is incremental as it builds on existing methods with specific enhancements.

The authors tackled the problem of Open Information Extraction models being sensitive to syntactic distribution shifts between training and real-world data by proposing a syntactically robust training framework using paraphrase generation and restoration algorithms. Their framework improved robustness, as shown by experiments where model performance degraded with increasing syntactic differences, and they introduced a new evaluation dataset, CaRB-AutoPara, for validation.

Open Information Extraction models have shown promising results with sufficient supervision. However, these models face a fundamental challenge that the syntactic distribution of training data is partially observable in comparison to the real world. In this paper, we propose a syntactically robust training framework that enables models to be trained on a syntactic-abundant distribution based on diverse paraphrase generation. To tackle the intrinsic problem of knowledge deformation of paraphrasing, two algorithms based on semantic similarity matching and syntactic tree walking are used to restore the expressionally transformed knowledge. The training framework can be generally applied to other syntactic partial observable domains. Based on the proposed framework, we build a new evaluation set called CaRB-AutoPara, a syntactically diverse dataset consistent with the real-world setting for validating the robustness of the models. Experiments including a thorough analysis show that the performance of the model degrades with the increase of the difference in syntactic distribution, while our framework gives a robust boundary. The source code is publicly available at https://github.com/qijimrc/RobustOIE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes