A prototype for projecting HPSG syntactic lexica towards LMF
This addresses a specific linguistic engineering challenge for Arabic NLP researchers, though it appears incremental as it builds on existing formalisms like HPSG and LMF.
The paper tackles the problem of comparing heterogeneous Arabic HPSG grammar lexica by developing a prototype that projects them to a normalized LMF-compliant pivot language using a rule system, enabling systematic comparison and potential merging.
The comparative evaluation of Arabic HPSG grammar lexica requires a deep study of their linguistic coverage. The complexity of this task results mainly from the heterogeneity of the descriptive components within those lexica (underlying linguistic resources and different data categories, for example). It is therefore essential to define more homogeneous representations, which in turn will enable us to compare them and eventually merge them. In this context, we present a method for comparing HPSG lexica based on a rule system. This method is implemented within a prototype for the projection from Arabic HPSG to a normalised pivot language compliant with LMF (ISO 24613 - Lexical Markup Framework) and serialised using a TEI (Text Encoding Initiative) based representation. The design of this system is based on an initial study of the HPSG formalism looking at its adequacy for the representation of Arabic, and from this, we identify the appropriate feature structures corresponding to each Arabic lexical category and their possible LMF counterparts.