Hassan Haji Mohammadi

h-index19

3papers

5citations

Novelty23%

AI Score17

Ranked #200,180 of 205,806 authors (top 97%)#31,940 in CL (top 98%)

3 Papers

CLNov 11, 2022

A hybrid entity-centric approach to Persian pronoun resolution

Hassan Haji Mohammadi, Alireza Talebpour, Ahmad Mahmoudi Aznaveh et al.

Pronoun resolution is a challenging subset of an essential field in natural language processing called coreference resolution. Coreference resolution is about finding all entities in the text that refers to the same real-world entity. This paper presents a hybrid model combining multiple rulebased sieves with a machine-learning sieve for pronouns. For this purpose, seven high-precision rule-based sieves are designed for the Persian language. Then, a random forest classifier links pronouns to the previous partial clusters. The presented method demonstrates exemplary performance using pipeline design and combining the advantages of machine learning and rulebased methods. This method has solved some challenges in end-to-end models. In this paper, the authors develop a Persian coreference corpus called Mehr in the form of 400 documents. This corpus fixes some weaknesses of the previous corpora in the Persian language. Finally, the efficiency of the presented system compared to the earlier model in Persian is reported by evaluating the proposed method on the Mehr and Uppsala test sets.

CLNov 8, 2022

Review of coreference resolution in English and Persian

Hassan Haji Mohammadi, Alireza Talebpour, Ahmad Mahmoudi Aznaveh et al.

Coreference resolution (CR), identifying expressions referring to the same real-world entity, is a fundamental challenge in natural language processing (NLP). This paper explores the latest advancements in CR, spanning coreference and anaphora resolution. We critically analyze the diverse corpora that have fueled CR research, highlighting their strengths, limitations, and suitability for various tasks. We examine the spectrum of evaluation metrics used to assess CR systems, emphasizing their advantages, disadvantages, and the need for more nuanced, task-specific metrics. Tracing the evolution of CR algorithms, we provide a detailed overview of methodologies, from rule-based approaches to cutting-edge deep learning architectures. We delve into mention-pair, entity-based, cluster-ranking, sequence-to-sequence, and graph neural network models, elucidating their theoretical foundations and performance on benchmark datasets. Recognizing the unique challenges of Persian CR, we dedicate a focused analysis to this under-resourced language. We examine existing Persian CR systems and highlight the emergence of end-to-end neural models leveraging pre-trained language models like ParsBERT. This review is an essential resource for researchers and practitioners, offering a comprehensive overview of the current state-of-the-art in CR, identifying key challenges, and charting a course for future research in this rapidly evolving field.

CLMay 17, 2024

Persian Pronoun Resolution: Leveraging Neural Networks and Language Models

Hassan Haji Mohammadi, Alireza Talebpour, Ahmad Mahmoudi Aznaveh et al.

Coreference resolution, critical for identifying textual entities referencing the same entity, faces challenges in pronoun resolution, particularly identifying pronoun antecedents. Existing methods often treat pronoun resolution as a separate task from mention detection, potentially missing valuable information. This study proposes the first end-to-end neural network system for Persian pronoun resolution, leveraging pre-trained Transformer models like ParsBERT. Our system jointly optimizes both mention detection and antecedent linking, achieving a 3.37 F1 score improvement over the previous state-of-the-art system (which relied on rule-based and statistical methods) on the Mehr corpus. This significant improvement demonstrates the effectiveness of combining neural networks with linguistic models, potentially marking a significant advancement in Persian pronoun resolution and paving the way for further research in this under-explored area.