CLFeb 4, 2023

A Benchmark and Scoring Algorithm for Enriching Arabic Synonyms

Sana Ghanem, Mustafa Jarrar, Radi Jarrar, Ibrahim Bounhas

arXiv:2302.02232v121.8137 citationsh-index: 29Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for improved synonym enrichment in Arabic language processing, though it is incremental as it builds on existing lexicons and focuses on a specific domain.

The paper tackles the problem of extending synsets with additional synonyms by considering synonymy strength as a fuzzy value, presenting an algorithm that computes these values and a benchmark dataset of 3K candidate synonyms for 500 synsets, with evaluations showing the algorithm's fuzzy values closely match those of linguists using RMSE and MAE metrics.

This paper addresses the task of extending a given synset with additional synonyms taking into account synonymy strength as a fuzzy value. Given a mono/multilingual synset and a threshold (a fuzzy value [0-1]), our goal is to extract new synonyms above this threshold from existing lexicons. We present twofold contributions: an algorithm and a benchmark dataset. The dataset consists of 3K candidate synonyms for 500 synsets. Each candidate synonym is annotated with a fuzzy value by four linguists. The dataset is important for (i) understanding how much linguists (dis/)agree on synonymy, in addition to (ii) using the dataset as a baseline to evaluate our algorithm. Our proposed algorithm extracts synonyms from existing lexicons and computes a fuzzy value for each candidate. Our evaluations show that the algorithm behaves like a linguist and its fuzzy values are close to those proposed by linguists (using RMSE and MAE). The dataset and a demo page are publicly available at https://portal.sina.birzeit.edu/synonyms.

View on arXiv PDF Code

Similar