Persian Causality Corpus (PerCause) and the Causality Detection Benchmark
This work addresses the problem of causality detection in low-resource languages like Persian, providing a new corpus and benchmark, but it is incremental as it applies existing methods to a new dataset.
The authors tackled the challenge of recognizing causal elements and relations in Persian text by creating a human-annotated corpus with 4446 sentences and 5128 causal relations, and used it to train systems for detection, achieving an F-measure of 0.76 with CRF and 91.4% accuracy with Bi-LSTM-CRF.
Recognizing causal elements and causal relations in text is one of the challenging issues in natural language processing; specifically, in low resource languages such as Persian. In this research we prepare a causality human annotated corpus for the Persian language which consists of 4446 sentences and 5128 causal relations and three labels of cause, effect and causal mark -- if possibl -- are specified for each relation. We have used this corpus to train a system for detecting causal elements boundaries. Also, we present a causality detection benchmark for three machine learning methods and two deep learning systems based on this corpus. Performance evaluations indicate that our best total result is obtained through CRF classifier which has F-measure of 0.76 and the best accuracy obtained through Bi-LSTM-CRF deep learning method with Accuracy equal to %91.4.