Moataz Ahmed

h-index23

9papers

53citations

Novelty30%

AI Score36

Ranked #96,358 of 194,257 authors (top 50%)#1,038 in SE (top 34%)

9 Papers

1.3CLJul 12, 2023Code

Ashaar: Automatic Analysis and Generation of Arabic Poetry Using Deep Learning Approaches

Zaid Alyafeai, Maged S. Al-Shaibani, Moataz Ahmed

Poetry holds immense significance within the cultural and traditional fabric of any nation. It serves as a vehicle for poets to articulate their emotions, preserve customs, and convey the essence of their culture. Arabic poetry is no exception, having played a cherished role in the heritage of the Arabic community throughout history and maintaining its relevance in the present era. Typically, comprehending Arabic poetry necessitates the expertise of a linguist who can analyze its content and assess its quality. This paper presents the introduction of a framework called \textit{Ashaar} https://github.com/ARBML/Ashaar, which encompasses a collection of datasets and pre-trained models designed specifically for the analysis and generation of Arabic poetry. The pipeline established within our proposed approach encompasses various aspects of poetry, such as meter, theme, and era classification. It also incorporates automatic poetry diacritization, enabling more intricate analyses like automated extraction of the \textit{Arudi} style. Additionally, we explore the feasibility of generating conditional poetry through the pre-training of a character-based GPT model. Furthermore, as part of this endeavor, we provide four datasets: one for poetry generation, another for diacritization, and two for Arudi-style prediction. These datasets aim to facilitate research and development in the field of Arabic poetry by enabling researchers and enthusiasts to delve into the nuances of this rich literary tradition.

11.3SEApr 1, 2025Code

Leveraging LLMs for User Stories in AI Systems: UStAI Dataset

Asma Yamani, Malak Baslyman, Moataz Ahmed

AI systems are gaining widespread adoption across various sectors and domains. Creating high-quality AI system requirements is crucial for aligning the AI system with business goals and consumer values and for social responsibility. However, with the uncertain nature of AI systems and the heavy reliance on sensitive data, more research is needed to address the elicitation and analysis of AI systems requirements. With the proprietary nature of many AI systems, there is a lack of open-source requirements artifacts and technical requirements documents for AI systems, limiting broader research and investigation. With Large Language Models (LLMs) emerging as a promising alternative to human-generated text, this paper investigates the potential use of LLMs to generate user stories for AI systems based on abstracts from scholarly papers. We conducted an empirical evaluation using three LLMs and generated $1260$ user stories from $42$ abstracts from $26$ domains. We assess their quality using the Quality User Story (QUS) framework. Moreover, we identify relevant non-functional requirements (NFRs) and ethical principles. Our analysis demonstrates that the investigated LLMs can generate user stories inspired by the needs of various stakeholders, offering a promising approach for generating user stories for research purposes and for aiding in the early requirements elicitation phase of AI systems. We have compiled and curated a collection of stories generated by various LLMs into a dataset (UStAI), which is now publicly available for use.

3.4SENov 6, 2025

Are We Aligned? A Preliminary Investigation of the Alignment of Responsible AI Values between LLMs and Human Judgment

Asma Yamani, Malak Baslyman, Moataz Ahmed

Large Language Models (LLMs) are increasingly employed in software engineering tasks such as requirements elicitation, design, and evaluation, raising critical questions regarding their alignment with human judgments on responsible AI values. This study investigates how closely LLMs' value preferences align with those of two human groups: a US-representative sample and AI practitioners. We evaluate 23 LLMs across four tasks: (T1) selecting key responsible AI values, (T2) rating their importance in specific contexts, (T3) resolving trade-offs between competing values, and (T4) prioritizing software requirements that embody those values. The results show that LLMs generally align more closely with AI practitioners than with the US-representative sample, emphasizing fairness, privacy, transparency, safety, and accountability. However, inconsistencies appear between the values that LLMs claim to uphold (Tasks 1-3) and the way they prioritize requirements (Task 4), revealing gaps in faithfulness between stated and applied behavior. These findings highlight the practical risk of relying on LLMs in requirements engineering without human oversight and motivate the need for systematic approaches to benchmark, interpret, and monitor value alignment in AI-assisted software development.

8.3CLMay 29, 2025Code

The Arabic AI Fingerprint: Stylometric Analysis and Detection of Large Language Models Text

Maged S. Al-Shaibani, Moataz Ahmed

Large Language Models (LLMs) have achieved unprecedented capabilities in generating human-like text, posing subtle yet significant challenges for information integrity across critical domains, including education, social media, and academia, enabling sophisticated misinformation campaigns, compromising healthcare guidance, and facilitating targeted propaganda. This challenge becomes severe, particularly in under-explored and low-resource languages like Arabic. This paper presents a comprehensive investigation of Arabic machine-generated text, examining multiple generation strategies (generation from the title only, content-aware generation, and text refinement) across diverse model architectures (ALLaM, Jais, Llama, and GPT-4) in academic, and social media domains. Our stylometric analysis reveals distinctive linguistic patterns differentiating human-written from machine-generated Arabic text across these varied contexts. Despite their human-like qualities, we demonstrate that LLMs produce detectable signatures in their Arabic outputs, with domain-specific characteristics that vary significantly between different contexts. Based on these insights, we developed BERT-based detection models that achieved exceptional performance in formal contexts (up to 99.9\% F1-score) with strong precision across model architectures. Our cross-domain analysis confirms generalization challenges previously reported in the literature. To the best of our knowledge, this work represents the most comprehensive investigation of Arabic machine-generated text to date, uniquely combining multiple prompt generation methods, diverse model architectures, and in-depth stylometric analysis across varied textual domains, establishing a foundation for developing robust, linguistically-informed detection systems essential for preserving information integrity in Arabic-language contexts.

7.1LGOct 19, 2025

Peering Inside the Black Box: Uncovering LLM Errors in Optimization Modelling through Component-Level Evaluation

Dania Refai, Moataz Ahmed

Large language models (LLMs) are increasingly used to convert natural language descriptions into mathematical optimization formulations. Current evaluations often treat formulations as a whole, relying on coarse metrics like solution accuracy or runtime, which obscure structural or numerical errors. In this study, we present a comprehensive, component-level evaluation framework for LLM-generated formulations. Beyond the conventional optimality gap, our framework introduces metrics such as precision and recall for decision variables and constraints, constraint and objective root mean squared error (RMSE), and efficiency indicators based on token usage and latency. We evaluate GPT-5, LLaMA 3.1 Instruct, and DeepSeek Math across optimization problems of varying complexity under six prompting strategies. Results show that GPT-5 consistently outperforms other models, with chain-of-thought, self-consistency, and modular prompting proving most effective. Analysis indicates that solver performance depends primarily on high constraint recall and low constraint RMSE, which together ensure structural correctness and solution reliability. Constraint precision and decision variable metrics play secondary roles, while concise outputs enhance computational efficiency. These findings highlight three principles for NLP-to-optimization modeling: (i) Complete constraint coverage prevents violations, (ii) minimizing constraint RMSE ensures solver-level accuracy, and (iii) concise outputs improve computational efficiency. The proposed framework establishes a foundation for fine-grained, diagnostic evaluation of LLMs in optimization modeling.

4.0SEFeb 2, 2014

A framework for reuse of multi-view UML artifacts

Hamza Onoruoiza Salami, Moataz Ahmed

Software is typically modeled from different viewpoints such as structural view, behavioral view and functional view. Few existing works can be considered as applying multi-view retrieval approaches. A number of important issues regarding mapping of entities during multi-view retrieval of UML models is identified in this study. In response, we describe a framework for reusing UML artifacts, and discuss how our retrieval approach tackles the identified issues.

6.9SEFeb 2, 2014

UML Artifacts Reuse: State of the Art

Hamza Onoruoiza Salami, Moataz A. Ahmed

The benefits that can be derived from reusing software include accelerated development, reduced cost, reduced risk and effective use of specialists. Reuse of software artifacts during the initial stages of software development increases reuse benefits, because it allows subsequent reuse of later stage artifacts derived from earlier artifacts. UML is the de facto modeling language used by software developers during the initial stages of software development such as requirements engineering, architectural and detailed design. This survey analyzes previous works on UML artifacts reuse. The analysis considers four perspectives: retrieval method, artifact support, tool support and experiments performed. As an outcome of the analysis, some suggestions for future work on UML artifacts reuse are also provided

4.0SEJan 29, 2014

Automatic Reference Models Development: A Framework

Mojeeb Al-Rhman Al-Khiaty, Moataz Ahmed

Software reuse allows the software industry to simultaneously reduce development cost and improve product quality. Reuse of early-stage artifacts has been acknowledged to be more beneficial than reuse of later-stage artifacts. In this regard, early-stage reference models have been considered as good tools to allow reuse across applications within the same domain. However, our literature survey reported in this paper reveals that the problem of automatically developing reference models from given instances has not caught enough researchers attention yet. Accordingly, in this paper we propose a framework for building a reference model that captures the common and variable analysis/design practices, across the different applications in a domain. The framework considers multi-view models in assessing the commonalities and variabilities among given instances. The proposed framework incorporates learning capabilities to allow improving the quality and re-usability of the reference model as it is being used.

4.0SEJan 21, 2014

Transition from Analysis to Software Design: A Review and New Perspective

Hamdi A. Al-Jamimi, Moataz Ahmed

Analysis and design phases are the most crucial part of the software development life-cycle. Reusing the artifacts of these early phases is very beneficial to improve the productivity and software quality. In this paper we analyze the literature on the automatic transformation of artifacts from the problem space (i.e., requirement analysis models) into artifacts in the solution space (i.e., architecture, design and implementation code). The goal is to assess the current state of the art with regard to the ability of automatically reusing previously developed software designs in synthesizing a new design for a given requirement. We surveyed various related areas such as model-driven development and model transformation techniques. Our analysis revealed that this topic has not been satisfactorily covered yet. Accordingly, we propose a framework consists of three stages to address uncovered limitations in current approaches.