CLOct 25, 2015

Statistical Parsing by Machine Learning from a Classical Arabic Treebank

arXiv:1510.07193v143 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of parsing Classical Arabic, an understudied language in computational linguistics, with incremental improvements over existing methods.

The research tackled parsing Classical Arabic by comparing a pure dependency parser with an integrated dependency-constituency model, finding that the hybrid approach achieved a higher F1-score of 89.03% versus 87.47%.

Research into statistical parsing for English has enjoyed over a decade of successful results. However, adapting these models to other languages has met with difficulties. Previous comparative work has shown that Modern Arabic is one of the most difficult languages to parse due to rich morphology and free word order. Classical Arabic is the ancient form of Arabic, and is understudied in computational linguistics, relative to its worldwide reach as the language of the Quran. The thesis is based on seven publications that make significant contributions to knowledge relating to annotating and parsing Classical Arabic. A central argument of this thesis is that using a hybrid representation closely aligned to traditional grammar leads to improved parsing for Arabic. To test this hypothesis, two approaches are compared. As a reference, a pure dependency parser is adapted using graph transformations, resulting in an 87.47% F1-score. This is compared to an integrated parsing model with an F1-score of 89.03%, demonstrating that joint dependency-constituency parsing is better suited to Classical Arabic.

View on arXiv PDF

Similar