CLJan 18, 2024

Distantly Supervised Morpho-Syntactic Model for Relation Extraction

arXiv:2401.10002v11.0

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of scalable information extraction for building rule-based systems and annotated datasets, but it is incremental as it builds on existing distant supervision and pattern-based methods.

The paper tackled the problem of extracting and categorizing an unrestricted set of relationships from text using a distantly supervised morpho-syntactic model, achieving a precision of up to 0.85 on six datasets built from Wikidata and Wikipedia, though with lower recall and F1 scores.

The task of Information Extraction (IE) involves automatically converting unstructured textual content into structured data. Most research in this field concentrates on extracting all facts or a specific set of relationships from documents. In this paper, we present a method for the extraction and categorisation of an unrestricted set of relationships from text. Our method relies on morpho-syntactic extraction patterns obtained by a distant supervision method, and creates Syntactic and Semantic Indices to extract and classify candidate graphs. We evaluate our approach on six datasets built on Wikidata and Wikipedia. The evaluation shows that our approach can achieve Precision scores of up to 0.85, but with lower Recall and F1 scores. Our approach allows to quickly create rule-based systems for Information Extraction and to build annotated datasets to train machine-learning and deep-learning based classifiers.

View on arXiv PDF

Similar