CL LGMay 24, 2024

Word Sense Disambiguation in Persian: Can AI Finally Get It Right?

Seyed Moein Ayyoubzadeh, Kourosh Shahnazari

arXiv:2406.00028v31.0h-index: 2

Originality Synthesis-oriented

AI Analysis

This work addresses a domain-specific problem for Persian NLP researchers and practitioners, but it is incremental as it applies existing methods to new data.

The study tackled homograph disambiguation in Persian by introducing a new dataset and evaluating various embeddings and models, achieving results measured in Accuracy, Recall, and F1 Score, though specific numbers are not provided.

Homograph disambiguation, the task of distinguishing words with identical spellings but different meanings, poses a substantial challenge in natural language processing. In this study, we introduce a novel dataset tailored for Persian homograph disambiguation. Our work encompasses a thorough exploration of various embeddings, evaluated through the cosine similarity method and their efficacy in downstream tasks like classification. Our investigation entails training a diverse array of lightweight machine learning and deep learning models for phonograph disambiguation. We scrutinize the models' performance in terms of Accuracy, Recall, and F1 Score, thereby gaining insights into their respective strengths and limitations. The outcomes of our research underscore three key contributions. First, we present a newly curated Persian dataset, providing a solid foundation for future research in homograph disambiguation. Second, our comparative analysis of embeddings highlights their utility in different contexts, enriching the understanding of their capabilities. Third, by training and evaluating a spectrum of models, we extend valuable guidance for practitioners in selecting suitable strategies for homograph disambiguation tasks. In summary, our study unveils a new dataset, scrutinizes embeddings through diverse perspectives, and benchmarks various models for homograph disambiguation. These findings empower researchers and practitioners to navigate the intricate landscape of homograph-related challenges effectively.

View on arXiv PDF

Similar