CLIRMar 6, 2016

Semi-Automatic Data Annotation, POS Tagging and Mildly Context-Sensitive Disambiguation: the eXtended Revised AraMorph (XRAM)

arXiv:1603.01833v11 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for improved morphological analysis tools for Classical and contemporary Arabic texts, though it appears incremental as it builds upon existing resources.

The paper tackles the problem of weaknesses and inconsistencies in Arabic morphological analysis by presenting XRAM, an extended and revised version of AraMorph, which achieved a remarkable success level in testing.

An extended, revised form of Tim Buckwalter's Arabic lexical and morphological resource AraMorph, eXtended Revised AraMorph (henceforth XRAM), is presented which addresses a number of weaknesses and inconsistencies of the original model by allowing a wider coverage of real-world Classical and contemporary (both formal and informal) Arabic texts. Building upon previous research, XRAM enhancements include (i) flag-selectable usage markers, (ii) probabilistic mildly context-sensitive POS tagging, filtering, disambiguation and ranking of alternative morphological analyses, (iii) semi-automatic increment of lexical coverage through extraction of lexical and morphological information from existing lexical resources. Testing of XRAM through a front-end Python module showed a remarkable success level.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes