CL AIJun 7, 2022

Guidelines and a Corpus for Extracting Biographical Events

Marco Antonio Stranisci, Enrico Mensa, Ousmane Diakite, Daniele Radicioni, Rossana Damiano

arXiv:2206.03547v131.0584 citationsh-index: 21

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of limited structured biographical data for underrepresented groups, such as non-Western authors and ethnic minorities, by providing interoperable guidelines and expanding existing resources, though it is incremental in nature.

The authors tackled the limited resources for automatically extracting biographical events by creating semantic annotation guidelines and a corpus from Wikipedia biographies of underrepresented writers, achieving an average Inter-Annotator Agreement of 0.825 on 1,000 annotated sentences.

Despite biographies are widely spread within the Semantic Web, resources and approaches to automatically extract biographical events are limited. Such limitation reduces the amount of structured, machine-readable biographical information, especially about people belonging to underrepresented groups. Our work challenges this limitation by providing a set of guidelines for the semantic annotation of life events. The guidelines are designed to be interoperable with existing ISO-standards for semantic annotation: ISO-TimeML (ISO-24617-1), and SemAF (ISO-24617-4). Guidelines were tested through an annotation task of Wikipedia biographies of underrepresented writers, namely authors born in non-Western countries, migrants, or belonging to ethnic minorities. 1,000 sentences were annotated by 4 annotators with an average Inter-Annotator Agreement of 0.825. The resulting corpus was mapped on OntoNotes. Such mapping allowed to to expand our corpus, showing that already existing resources may be exploited for the biographical event extraction task.

View on arXiv PDF

Similar