Ali Chehab

CL
h-index22
13papers
107citations
Novelty34%
AI Score40

13 Papers

CLApr 30, 2025Code
Fine-Tuning LLMs for Low-Resource Dialect Translation: The Case of Lebanese

Silvana Yakhni, Ali Chehab

This paper examines the effectiveness of Large Language Models (LLMs) in translating the low-resource Lebanese dialect, focusing on the impact of culturally authentic data versus larger translated datasets. We compare three fine-tuning approaches: Basic, contrastive, and grammar-hint tuning, using open-source Aya23 models. Experiments reveal that models fine-tuned on a smaller but culturally aware Lebanese dataset (LW) consistently outperform those trained on larger, non-native data. The best results were achieved through contrastive fine-tuning paired with contrastive prompting, which indicates the benefits of exposing translation models to bad examples. In addition, to ensure authentic evaluation, we introduce LebEval, a new benchmark derived from native Lebanese content, and compare it to the existing FLoRes benchmark. Our findings challenge the "More Data is Better" paradigm and emphasize the crucial role of cultural authenticity in dialectal translation. We made our datasets and code available on Github.

SENov 28, 2025Code
Retrieval-Augmented Few-Shot Prompting Versus Fine-Tuning for Code Vulnerability Detection

Fouad Trad, Ali Chehab

Few-shot prompting has emerged as a practical alternative to fine-tuning for leveraging the capabilities of large language models (LLMs) in specialized tasks. However, its effectiveness depends heavily on the selection and quality of in-context examples, particularly in complex domains. In this work, we examine retrieval-augmented prompting as a strategy to improve few-shot performance in code vulnerability detection, where the goal is to identify one or more security-relevant weaknesses present in a given code snippet from a predefined set of vulnerability categories. We perform a systematic evaluation using the Gemini-1.5-Flash model across three approaches: (1) standard few-shot prompting with randomly selected examples, (2) retrieval-augmented prompting using semantically similar examples, and (3) retrieval-based labeling, which assigns labels based on retrieved examples without model inference. Our results show that retrieval-augmented prompting consistently outperforms the other prompting strategies. At 20 shots, it achieves an F1 score of 74.05% and a partial match accuracy of 83.90%. We further compare this approach against zero-shot prompting and several fine-tuned models, including Gemini-1.5-Flash and smaller open-source models such as DistilBERT, DistilGPT2, and CodeBERT. Retrieval-augmented prompting outperforms both zero-shot (F1 score: 36.35%, partial match accuracy: 20.30%) and fine-tuned Gemini (F1 score: 59.31%, partial match accuracy: 53.10%), while avoiding the training time and cost associated with model fine-tuning. On the other hand, fine-tuning CodeBERT yields higher performance (F1 score: 91.22%, partial match accuracy: 91.30%) but requires additional training, maintenance effort, and resources.

CLNov 29, 2024
To Ensemble or Not: Assessing Majority Voting Strategies for Phishing Detection with Large Language Models

Fouad Trad, Ali Chehab

The effectiveness of Large Language Models (LLMs) significantly relies on the quality of the prompts they receive. However, even when processing identical prompts, LLMs can yield varying outcomes due to differences in their training processes. To leverage the collective intelligence of multiple LLMs and enhance their performance, this study investigates three majority voting strategies for text classification, focusing on phishing URL detection. The strategies are: (1) a prompt-based ensemble, which utilizes majority voting across the responses generated by a single LLM to various prompts; (2) a model-based ensemble, which entails aggregating responses from multiple LLMs to a single prompt; and (3) a hybrid ensemble, which combines the two methods by sending different prompts to multiple LLMs and then aggregating their responses. Our analysis shows that ensemble strategies are most suited in cases where individual components exhibit equivalent performance levels. However, when there is a significant discrepancy in individual performance, the effectiveness of the ensemble method may not exceed that of the highest-performing single LLM or prompt. In such instances, opting for ensemble techniques is not recommended.

AIDec 3, 2024
Large Multimodal Agents for Accurate Phishing Detection with Enhanced Token Optimization and Cost Reduction

Fouad Trad, Ali Chehab

With the rise of sophisticated phishing attacks, there is a growing need for effective and economical detection solutions. This paper explores the use of large multimodal agents, specifically Gemini 1.5 Flash and GPT-4o mini, to analyze both URLs and webpage screenshots via APIs, thus avoiding the complexities of training and maintaining AI systems. Our findings indicate that integrating these two data types substantially enhances detection performance over using either type alone. However, API usage incurs costs per query that depend on the number of input and output tokens. To address this, we propose a two-tiered agentic approach: initially, one agent assesses the URL, and if inconclusive, a second agent evaluates both the URL and the screenshot. This method not only maintains robust detection performance but also significantly reduces API costs by minimizing unnecessary multi-input queries. Cost analysis shows that with the agentic approach, GPT-4o mini can process about 4.2 times as many websites per $100 compared to the multimodal approach (107,440 vs. 25,626), and Gemini 1.5 Flash can process about 2.6 times more websites (2,232,142 vs. 862,068). These findings underscore the significant economic benefits of the agentic approach over the multimodal method, providing a viable solution for organizations aiming to leverage advanced AI for phishing detection while controlling expenses.

CLDec 14, 2024
Streamlining Systematic Reviews: A Novel Application of Large Language Models

Fouad Trad, Ryan Yammine, Jana Charafeddine et al.

Systematic reviews (SRs) are essential for evidence-based guidelines but are often limited by the time-consuming nature of literature screening. We propose and evaluate an in-house system based on Large Language Models (LLMs) for automating both title/abstract and full-text screening, addressing a critical gap in the literature. Using a completed SR on Vitamin D and falls (14,439 articles), the LLM-based system employed prompt engineering for title/abstract screening and Retrieval-Augmented Generation (RAG) for full-text screening. The system achieved an article exclusion rate (AER) of 99.5%, specificity of 99.6%, a false negative rate (FNR) of 0%, and a negative predictive value (NPV) of 100%. After screening, only 78 articles required manual review, including all 20 identified by traditional methods, reducing manual screening time by 95.5%. For comparison, Rayyan, a commercial tool for title/abstract screening, achieved an AER of 72.1% and FNR of 5% when including articles Rayyan considered as undecided or likely to include. Lowering Rayyan's inclusion thresholds improved FNR to 0% but increased screening time. By addressing both screening phases, the LLM-based system significantly outperformed Rayyan and traditional methods, reducing total screening time to 25.5 hours while maintaining high accuracy. These findings highlight the transformative potential of LLMs in SR workflows by offering a scalable, efficient, and accurate solution, particularly for the full-text screening phase, which has lacked automation tools.

AIMar 26, 2024
Evaluating the Efficacy of Prompt-Engineered Large Multimodal Models Versus Fine-Tuned Vision Transformers in Image-Based Security Applications

Fouad Trad, Ali Chehab

The success of Large Language Models (LLMs) has led to a parallel rise in the development of Large Multimodal Models (LMMs), which have begun to transform a variety of applications. These sophisticated multimodal models are designed to interpret and analyze complex data by integrating multiple modalities such as text and images, thereby opening new avenues for a range of applications. This paper investigates the applicability and effectiveness of prompt-engineered LMMs that process both images and text, including models such as LLaVA, BakLLaVA, Moondream, Gemini-pro-vision, and GPT-4o, compared to fine-tuned Vision Transformer (ViT) models in addressing critical security challenges. We focus on two distinct security tasks: 1) a visually evident task of detecting simple triggers, such as small pixel variations in images that could be exploited to access potential backdoors in the models, and 2) a visually non-evident task of malware classification through visual representations. In the visually evident task, some LMMs, such as Gemini-pro-vision and GPT-4o, have demonstrated the potential to achieve good performance with careful prompt engineering, with GPT-4o achieving the highest accuracy and F1-score of 91.9\% and 91\%, respectively. However, the fine-tuned ViT models exhibit perfect performance in this task due to its simplicity. For the visually non-evident task, the results highlight a significant divergence in performance, with ViT models achieving F1-scores of 97.11\% in predicting 25 malware classes and 97.61\% in predicting 5 malware families, whereas LMMs showed suboptimal performance despite iterative prompt improvements. This study not only showcases the strengths and limitations of prompt-engineered LMMs in cybersecurity applications but also emphasizes the unmatched efficacy of fine-tuned ViT models for precise and dependable tasks.

CRMay 6, 2025
Detecting Quishing Attacks with Machine Learning Techniques Through QR Code Analysis

Fouad Trad, Ali Chehab

The rise of QR code based phishing ("Quishing") poses a growing cybersecurity threat, as attackers increasingly exploit QR codes to bypass traditional phishing defenses. Existing detection methods predominantly focus on URL analysis, which requires the extraction of the QR code payload, and may inadvertently expose users to malicious content. Moreover, QR codes can encode various types of data beyond URLs, such as Wi-Fi credentials and payment information, making URL-based detection insufficient for broader security concerns. To address these gaps, we propose the first framework for quishing detection that directly analyzes QR code structure and pixel patterns without extracting the embedded content. We generated a dataset of phishing and benign QR codes and we used it to train and evaluate multiple machine learning models, including Logistic Regression, Decision Trees, Random Forest, Naive Bayes, LightGBM, and XGBoost. Our best-performing model (XGBoost) achieves an AUC of 0.9106, demonstrating the feasibility of QR-centric detection. Through feature importance analysis, we identify key visual indicators of malicious intent and refine our feature set by removing non-informative pixels, improving performance to an AUC of 0.9133 with a reduced feature space. Our findings reveal that the structural features of QR code correlate strongly with phishing risk. This work establishes a foundation for quishing mitigation and highlights the potential of direct QR analysis as a critical layer in modern phishing defenses.

LGDec 15, 2025
On the Effectiveness of Membership Inference in Targeted Data Extraction from Large Language Models

Ali Al Sahili, Ali Chehab, Razane Tajeddine

Large Language Models (LLMs) are prone to memorizing training data, which poses serious privacy risks. Two of the most prominent concerns are training data extraction and Membership Inference Attacks (MIAs). Prior research has shown that these threats are interconnected: adversaries can extract training data from an LLM by querying the model to generate a large volume of text and subsequently applying MIAs to verify whether a particular data point was included in the training set. In this study, we integrate multiple MIA techniques into the data extraction pipeline to systematically benchmark their effectiveness. We then compare their performance in this integrated setting against results from conventional MIA benchmarks, allowing us to evaluate their practical utility in real-world extraction scenarios.

IRNov 28, 2025
Chained Prompting for Better Systematic Review Search Strategies

Fatima Nasser, Fouad Trad, Ammar Mohanna et al.

Systematic reviews require the use of rigorously designed search strategies to ensure both comprehensive retrieval and minimization of bias. Conventional manual approaches, although methodologically systematic, are resource-intensive and susceptible to subjectivity, whereas heuristic and automated techniques frequently under-perform in recall unless supplemented by extensive expert input. We introduce a Large Language Model (LLM)-based chained prompt engineering framework for the automated development of search strategies in systematic reviews. The framework replicates the procedural structure of manual search design while leveraging LLMs to decompose review objectives, extract and formalize PICO elements, generate conceptual representations, expand terminologies, and synthesize Boolean queries. In addition to query construction, the framework exhibits superior performance in generating well-structured PICO elements relative to existing methods, thereby strengthening the foundation for high-recall search strategies. Evaluation on a subset of the LEADSInstruct dataset demonstrates that the framework attains a 0.9 average recall. These results significantly exceed the performance of existing approaches. Error analysis further highlights the critical role of precise objective specification and terminological alignment in optimizing retrieval effectiveness. These findings confirm the capacity of LLM-based pipelines to yield transparent, reproducible, and high-performing search strategies, and highlight their potential as scalable instruments for supporting evidence synthesis and evidence-based practice.

CRMar 31, 2021
Digital Forensics vs. Anti-Digital Forensics: Techniques, Limitations and Recommendations

Jean-Paul A. Yaacoub, Hassan N. Noura, Ola Salman et al.

The number of cyber attacks has increased tremendously in the last few years. This resulted into both human and financial losses at the individual and organization levels. Recently, cyber-criminals are leveraging new skills and capabilities by employing anti-forensics activities, techniques and tools to cover their tracks and evade any possible detection. Consequently, cyber-attacks are becoming more efficient and more sophisticated. Therefore, traditional cryptographic and non-cryptographic solutions and access control systems are no longer enough to prevent such cyber attacks, especially in terms of acquiring evidence for attack investigation. Hence, the need for well-defined, sophisticated, and advanced forensics investigation tools are highly required to track down cyber criminals and to reduce the number of cyber crimes. This paper reviews the different forensics and anti-forensics methods, tools, techniques, types, and challenges, while also discussing the rise of the anti-anti-forensics as a new forensics protection mechanism against anti-forensics activities. This would help forensics investigators to better understand the different anti-forensics tools, methods and techniques that cyber criminals employ while launching their attacks. Moreover, the limitations of the current forensics techniques are discussed, especially in terms of issues and challenges. Finally, this paper presents a holistic view from a literature point of view over the forensics domain and also helps other fellow colleagues in their quest to further understand the digital forensics domain.

CRMar 28, 2021
A Survey on Ethical Hacking: Issues and Challenges

Jean-Paul A. Yaacoub, Hassan N. Noura, Ola Salman et al.

Security attacks are growing in an exponential manner and their impact on existing systems is seriously high and can lead to dangerous consequences. However, in order to reduce the effect of these attacks, penetration tests are highly required, and can be considered as a suitable solution for this task. Therefore, the main focus of this paper is to explain the technical and non-technical steps of penetration tests. The objective of penetration tests is to make existing systems and their corresponding data more secure, efficient and resilient. In other terms, pen testing is a simulated attack with the goal of identifying any exploitable vulnerability or/and a security gap. In fact, any identified exploitable vulnerability will be used to conduct attacks on systems, devices, or personnel. This growing problem should be solved and mitigated to reach better resistance against these attacks. Moreover, the advantages and limitations of penetration tests are also listed. The main issue of penetration tests that it is efficient to detect known vulnerabilities. Therefore, in order to resist unknown vulnerabilities, a new kind of modern penetration tests is required, in addition to reinforcing the use of shadows honeypots. This can also be done by reinforcing the anomaly detection of intrusion detection/prevention system. In fact, security is increased by designing an efficient cooperation between the different security elements and penetration tests.

IVApr 14, 2019
Automatic Target Detection for Sparse Hyperspectral Images

Ahmad W. Bitar, Jean-Philippe Ovarlez, Loong-Fah Cheong et al.

In this work, a novel target detector for hyperspectral imagery is developed. The detector is independent on the unknown covariance matrix, behaves well in large dimensions, distributional free, invariant to atmospheric effects, and does not require a background dictionary to be constructed. Based on a modification of the robust principal component analysis (RPCA), a given hyperspectral image (HSI) is regarded as being made up of the sum of a low-rank background HSI and a sparse target HSI that contains the targets based on a pre-learned target dictionary specified by the user. The sparse component is directly used for the detection, that is, the targets are simply detected at the non-zero entries of the sparse target HSI. Hence, a novel target detector is developed, which is simply a sparse HSI generated automatically from the original HSI, but containing only the targets with the background is suppressed. The detector is evaluated on real experiments, and the results of which demonstrate its effectiveness for hyperspectral target detection especially when the targets are well matched to the surroundings.

CVNov 21, 2017
Efficient Implementation of a Recognition System Using the Cortex Ventral Stream Model

Ahmad W. Bitar, Mohammad M. Mansour, Ali Chehab

In this paper, an efficient implementation for a recognition system based on the original HMAX model of the visual cortex is proposed. Various optimizations targeted to increase accuracy at the so-called layers S1, C1, and S2 of the HMAX model are proposed. At layer S1, all unimportant information such as illumination and expression variations are eliminated from the images. Each image is then convolved with 64 separable Gabor filters in the spatial domain. At layer C1, the minimum scales values are exploited to be embedded into the maximum ones using the additive embedding space. At layer S2, the prototypes are generated in a more efficient way using Partitioning Around Medoid (PAM) clustering algorithm. The impact of these optimizations in terms of accuracy and computational complexity was evaluated on the Caltech101 database, and compared with the baseline performance using support vector machine (SVM) and nearest neighbor (NN) classifiers. The results show that our model provides significant improvement in accuracy at the S1 layer by more than 10% where the computational complexity is also reduced. The accuracy is slightly increased for both approximations at the C1 and S2 layers.