CL AIOct 22, 2023

ARCOQ: Arabic Closest Opposite Questions Dataset

Sandra Rizkallah, Amir F. Atiya, Samir Shaheen

arXiv:2310.14384v10.51 citationsh-index: 43Has Code

Originality Synthesis-oriented

AI Analysis

This dataset addresses the lack of resources for antonymy detection in Arabic, benefiting NLP researchers and practitioners, but it is incremental as it adapts an existing English dataset structure to Arabic.

The authors introduced ARCOQ, the first dataset for closest opposite questions in Arabic, containing 500 questions to evaluate antonymy detection systems, and benchmarked various Arabic word embedding models on it.

This paper presents a dataset for closest opposite questions in Arabic language. The dataset is the first of its kind for the Arabic language. It is beneficial for the assessment of systems on the aspect of antonymy detection. The structure is similar to that of the Graduate Record Examination (GRE) closest opposite questions dataset for the English language. The introduced dataset consists of 500 questions, each contains a query word for which the closest opposite needs to be determined from among a set of candidate words. Each question is also associated with the correct answer. We publish the dataset publicly in addition to providing standard splits of the dataset into development and test sets. Moreover, the paper provides a benchmark for the performance of different Arabic word embedding models on the introduced dataset.

View on arXiv PDF Code

Similar