CLOct 17, 2014

Arabic Language Text Classification Using Dependency Syntax-Based Feature Selection

Yannis Haralambous, Yassir Elidrissi, Philippe Lenca

arXiv:1410.4863v118 citations

Originality Synthesis-oriented

AI Analysis

This work addresses text classification for Arabic language processing, presenting incremental improvements by optimizing method combinations for specific feature set sizes.

The study tackled Arabic text classification by comparing feature selection methods (tfidf vs. dependency syntax) and classifiers (class association rules vs. support vector machines) on rootified and lightly stemmed text, finding that lightly stemmed text performs better and that class association rules work well with small feature sets from dependency syntax, while support vector machines excel with large feature sets from morphological criteria.

We study the performance of Arabic text classification combining various techniques: (a) tfidf vs. dependency syntax, for feature selection and weighting; (b) class association rules vs. support vector machines, for classification. The Arabic text is used in two forms: rootified and lightly stemmed. The results we obtain show that lightly stemmed text leads to better performance than rootified text; that class association rules are better suited for small feature sets obtained by dependency syntax constraints; and, finally, that support vector machines are better suited for large feature sets based on morphological feature selection criteria.

View on arXiv PDF

Similar