Fethi Rabhi

AI
h-index27
9papers
209citations
Novelty31%
AI Score40

9 Papers

AIDec 1, 2025
OntoMetric: An Ontology-Guided Framework for Automated ESG Knowledge Graph Construction

Mingqin Yu, Fethi Rabhi, Boming Xia et al.

Environmental, Social, and Governance (ESG) disclosure frameworks such as SASB, TCFD, and IFRS S2 require organizations to compute and report numerous metrics for compliance, yet these requirements are embedded in long, unstructured PDF documents that are difficult to interpret, standardize, and audit. Manual extraction is unscalable, while unconstrained large language model (LLM) extraction often produces inconsistent entities, hallucinated relationships, missing provenance, and high validation failure rates. We present OntoMetric, an ontology-guided framework that transforms ESG regulatory documents into validated, AI- and web-ready knowledge graphs. OntoMetric operates through a three-stage pipeline: (1) structure-aware segmentation using table-of-contents boundaries, (2) ontology-constrained LLM extraction that embeds the ESGMKG schema into prompts while enriching entities with semantic fields for downstream reasoning, and (3) two-phase validation that combines LLM-based semantic verification with rule-based schema checking across entity, property, and relationship levels (VR001-VR006). The framework preserves both segment-level and page-level provenance for audit traceability. Evaluated on five ESG standards (SASB Commercial Banks, SASB Semiconductors, TCFD, IFRS S2, AASB S2) totaling 228 pages and 60 segments, OntoMetric achieves 65-90% semantic accuracy and 80-90% schema compliance, compared to 3-10% for baseline unconstrained extraction, at approximately 0.01 to 0.02 USD per validated entity. Our results demonstrate that combining symbolic ontology constraints with neural extraction enables reliable, auditable knowledge graphs suitable for regulatory compliance and web integration, supporting downstream applications such as sustainable-finance analytics, transparency portals, and automated compliance tools.

AIJan 7
Architecting Agentic Communities using Design Patterns

Zoran Milosevic, Fethi Rabhi

The rapid evolution of Large Language Models (LLM) and subsequent Agentic AI technologies requires systematic architectural guidance for building sophisticated, production-grade systems. This paper presents an approach for architecting such systems using design patterns derived from enterprise distributed systems standards, formal methods, and industry practice. We classify these patterns into three tiers: LLM Agents (task-specific automation), Agentic AI (adaptive goal-seekers), and Agentic Communities (organizational frameworks where AI agents and human participants coordinate through formal roles, protocols, and governance structures). We focus on Agentic Communities - coordination frameworks encompassing LLM Agents, Agentic AI entities, and humans - most relevant for enterprise and industrial applications. Drawing on established coordination principles from distributed systems, we ground these patterns in a formal framework that specifies collaboration agreements where AI agents and humans fill roles within governed ecosystems. This approach provides both practical guidance and formal verification capabilities, enabling expression of organizational, legal, and ethical rules through accountability mechanisms that ensure operational and verifiable governance of inter-agent communication, negotiation, and intent modeling. We validate this framework through a clinical trial matching case study. Our goal is to provide actionable guidance to practitioners while maintaining the formal rigor essential for enterprise deployment in dynamic, multi-agent ecosystems.

LGOct 30, 2025
New Money: A Systematic Review of Synthetic Data Generation for Finance

James Meldrum, Basem Suleiman, Fethi Rabhi et al.

Synthetic data generation has emerged as a promising approach to address the challenges of using sensitive financial data in machine learning applications. By leveraging generative models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), it is possible to create artificial datasets that preserve the statistical properties of real financial records while mitigating privacy risks and regulatory constraints. Despite the rapid growth of this field, a comprehensive synthesis of the current research landscape has been lacking. This systematic review consolidates and analyses 72 studies published since 2018 that focus on synthetic financial data generation. We categorise the types of financial information synthesised, the generative methods employed, and the evaluation strategies used to assess data utility and privacy. The findings indicate that GAN-based approaches dominate the literature, particularly for generating time-series market data and tabular credit data. While several innovative techniques demonstrate potential for improved realism and privacy preservation, there remains a notable lack of rigorous evaluation of privacy safeguards across studies. By providing an integrated overview of generative techniques, applications, and evaluation methods, this review highlights critical research gaps and offers guidance for future work aimed at developing robust, privacy-preserving synthetic data solutions for the financial domain.

SEJun 17, 2024
SLEGO: A Collaborative Data Analytics System with LLM Recommender for Diverse Users

Siu Lung Ng, Hirad Baradaran Rezaei, Fethi Rabhi

This paper presents the SLEGO (Software-Lego) system, a collaborative analytics platform that bridges the gap between experienced developers and novice users using a cloud-based platform with modular, reusable microservices. These microservices enable developers to share their analytical tools and workflows, while a simple graphical user interface (GUI) allows novice users to build comprehensive analytics pipelines without programming skills. Supported by a knowledge base and a Large Language Model (LLM) powered recommendation system, SLEGO enhances the selection and integration of microservices, increasing the efficiency of analytics pipeline construction. Case studies in finance and machine learning illustrate how SLEGO promotes the sharing and assembly of modular microservices, significantly improving resource reusability and team collaboration. The results highlight SLEGO's role in democratizing data analytics by integrating modular design, knowledge bases, and recommendation systems, fostering a more inclusive and efficient analytical environment.

SEApr 17, 2020
Cloud Migration Methodologies Preliminary Findings

Mahdi Fahmideh, Farhad Daneshgar, Fethi Rabhi

Research around cloud computing has largely been dedicated to ad-dressing technical aspects associated with utilizing cloud services, surveying critical success factors for the cloud adoption, and opinions about its impact on IT functions. Nevertheless, the aspect of process models for the cloud migration has been slow in pace. Several methodologies have been proposed by both aca-demia and industry for moving legacy applications to the cloud. This paper pre-sents a criteria-based appraisal of such existing methodologies. The results of the analysis highlight the strengths and weaknesses of these methodologies and can be used by cloud service consumers for comparing and selecting the most appropriate ones that fit specific migration scenarios. The paper also suggests research opportunities to improve the status quo. Keywords Cloud Migration; Legacy Applications; Cloud Migration Method-ology, Evaluation Framework

SEApr 17, 2020
Challenges in migrating legacy software systems to the cloud an empirical study

Mahdi Fahmideh, Farhad Daneshgar, Ghassan Beydoun et al.

Moving existing legacy systems to cloud platforms is a difficult and high cost process that may involve technical and non-technical resources and challenges. There is evidence that the lack of understanding and preparedness of cloud computing migration underpin many migration failures in achieving organisations goals. The main goal of this article is to identify the most important challenging activities for moving legacy systems to cloud platforms from a perspective of reengineering process. Through a combination of a bottom-up and a top-down analysis, a set of common activities is derived from the extant cloud computing literature. These are expressed as a model and are validated using a population of 104 shortlisted and randomly selected domain experts from different industry sectors. We used a Web-based survey questionnaire to collect data and analysed them using SPSS Sample T-Test. The results of this study highlight the most important and critical challenges that should be addressed by various roles within a legacy to cloud migration endeavour. The study provides an overall understanding of this process including common occurring activities, concerns and recommendations. In addition, the findings of this study constitute a practical guide to conduct this transition. This guide is platform agnostic and independent from any specific migration scenario, cloud platform, or an application domain. Keywords. Cloud Computing, Legacy Systems, Cloud Migration, Cloud Migration Process

LGApr 28, 2018
Credit risk prediction in an imbalanced social lending environment

Anahita Namvar, Mohammad Siami, Fethi Rabhi et al.

Credit risk prediction is an effective way of evaluating whether a potential borrower will repay a loan, particularly in peer-to-peer lending where class imbalance problems are prevalent. However, few credit risk prediction models for social lending consider imbalanced data and, further, the best resampling technique to use with imbalanced data is still controversial. In an attempt to address these problems, this paper presents an empirical comparison of various combinations of classifiers and resampling techniques within a novel risk assessment methodology that incorporates imbalanced data. The credit predictions from each combination are evaluated with a G-mean measure to avoid bias towards the majority class, which has not been considered in similar studies. The results reveal that combining random forest and random under-sampling may be an effective strategy for calculating the credit risk associated with loan applicants in social lending markets.

CRFeb 20, 2018
KASR: A Reliable and Practical Approach to Attack Surface Reduction of Commodity OS Kernels

Zhi Zhang, Yueqiang Cheng, Surya Nepal et al.

Commodity OS kernels have broad attack surfaces due to the large code base and the numerous features such as device drivers. For a real-world use case (e.g., an Apache Server), many kernel services are unused and only a small amount of kernel code is used. Within the used code, a certain part is invoked only at runtime while the rest are executed at startup and/or shutdown phases in the kernel's lifetime run. In this paper, we propose a reliable and practical system, named KASR, which transparently reduces attack surfaces of commodity OS kernels at runtime without requiring their source code. The KASR system, residing in a trusted hypervisor, achieves the attack surface reduction through a two-step approach: (1) reliably depriving unused code of executable permissions, and (2) transparently segmenting used code and selectively activating them. We implement a prototype of KASR on Xen-4.8.2 hypervisor and evaluate its security effectiveness on Linux kernel-4.4.0-87-generic. Our evaluation shows that KASR reduces the kernel attack surface by 64% and trims off 40% of CVE vulnerabilities. Besides, KASR successfully detects and blocks all 6 real-world kernel rootkits. We measure its performance overhead with three benchmark tools (i.e., SPECINT, httperf and bonnie++). The experimental results indicate that KASR imposes less than 1% performance overhead (compared to an unmodified Xen hypervisor) on all the benchmarks.

AIMar 30, 2014
Enhancing Automated Decision Support across Medical and Oral Health Domains with Semantic Web Technologies

Tejal Shah, Fethi Rabhi, Pradeep Ray et al.

Research has shown that the general health and oral health of an individual are closely related. Accordingly, current practice of isolating the information base of medical and oral health domains can be dangerous and detrimental to the health of the individual. However, technical issues such as heterogeneous data collection and storage formats, limited sharing of patient information and lack of decision support over the shared information are the principal reasons for the current state of affairs. To address these issues, the following research investigates the development and application of a cross-domain ontology and rules to build an evidence-based and reusable knowledge base consisting of the inter-dependent conditions from the two domains. Through example implementation of the knowledge base in Protege, we demonstrate the effectiveness of our approach in reasoning over and providing decision support for cross-domain patient information.