CLJul 30, 2024

Event-Arguments Extraction Corpus and Modeling using BERT for Arabic

Alaa Aljabari, Lina Duaibes, Mustafa Jarrar, Mohammed Khalilia

arXiv:2407.21153v115.430 citationsh-index: 13

Originality Highly original

AI Analysis

This work addresses the challenge of sparse linguistic resources for event-argument extraction in Arabic, providing a new corpus and method that could benefit NLP researchers and practitioners in Arabic language processing.

The authors tackled the problem of event-argument extraction in Arabic by introducing the HADATH corpus (550k tokens) with event-argument annotations, achieving an F1-score of 94.01% using a BERT-based method that treats the task as text entailment.

Event-argument extraction is a challenging task, particularly in Arabic due to sparse linguistic resources. To fill this gap, we introduce the \hadath corpus ($550$k tokens) as an extension of Wojood, enriched with event-argument annotations. We used three types of event arguments: $agent$, $location$, and $date$, which we annotated as relation types. Our inter-annotator agreement evaluation resulted in $82.23\%$ $Kappa$ score and $87.2\%$ $F_1$-score. Additionally, we propose a novel method for event relation extraction using BERT, in which we treat the task as text entailment. This method achieves an $F_1$-score of $94.01\%$. To further evaluate the generalization of our proposed method, we collected and annotated another out-of-domain corpus (about $80$k tokens) called \testNLI and used it as a second test set, on which our approach achieved promising results ($83.59\%$ $F_1$-score). Last but not least, we propose an end-to-end system for event-arguments extraction. This system is implemented as part of SinaTools, and both corpora are publicly available at {\small \url{https://sina.birzeit.edu/wojood}}

View on arXiv PDF

Similar