Amir Mahdi Sadeghzadeh

CR
h-index22
5papers
140citations
Novelty40%
AI Score36

5 Papers

CLSep 8, 2025Code
EPT Benchmark: Evaluation of Persian Trustworthiness in Large Language Models

Mohammad Reza Mirbagheri, Mohammad Mahdi Mirkamali, Zahra Motoshaker Arani et al.

Large Language Models (LLMs), trained on extensive datasets using advanced deep learning architectures, have demonstrated remarkable performance across a wide range of language tasks, becoming a cornerstone of modern AI technologies. However, ensuring their trustworthiness remains a critical challenge, as reliability is essential not only for accurate performance but also for upholding ethical, cultural, and social values. Careful alignment of training data and culturally grounded evaluation criteria are vital for developing responsible AI systems. In this study, we introduce the EPT (Evaluation of Persian Trustworthiness) metric, a culturally informed benchmark specifically designed to assess the trustworthiness of LLMs across six key aspects: truthfulness, safety, fairness, robustness, privacy, and ethical alignment. We curated a labeled dataset and evaluated the performance of several leading models - including ChatGPT, Claude, DeepSeek, Gemini, Grok, LLaMA, Mistral, and Qwen - using both automated LLM-based and human assessments. Our results reveal significant deficiencies in the safety dimension, underscoring the urgent need for focused attention on this critical aspect of model behavior. Furthermore, our findings offer valuable insights into the alignment of these models with Persian ethical-cultural values and highlight critical gaps and opportunities for advancing trustworthy and culturally responsible AI. The dataset is publicly available at: https://github.com/Rezamirbagheri110/EPT-Benchmark.

CRAug 27, 2021
Mal2GCN: A Robust Malware Detection Approach Using Deep Graph Convolutional Networks With Non-Negative Weights

Omid Kargarnovin, Amir Mahdi Sadeghzadeh, Rasool Jalili

With the growing pace of using Deep Learning (DL) to solve various problems, securing these models against adversaries has become one of the main concerns of researchers. Recent studies have shown that DL-based malware detectors are vulnerable to adversarial examples. An adversary can create carefully crafted adversarial examples to evade DL-based malware detectors. In this paper, we propose Mal2GCN, a robust malware detection model that uses Function Call Graph (FCG) representation of executable files combined with Graph Convolution Network (GCN) to detect Windows malware. Since FCG representation of executable files is more robust than raw byte sequence representation, numerous proposed adversarial example generating methods are ineffective in evading Mal2GCN. Moreover, we use the non-negative training method to transform Mal2GCN to a monotonically non-decreasing function; thereby, it becomes theoretically robust against appending attacks. We then present a black-box source code-based adversarial malware generation approach that can be used to evaluate the robustness of malware detection models against real-world adversaries. The proposed approach injects adversarial codes into the various locations of malware source codes to evade malware detection models. The experiments demonstrate that Mal2GCN with non-negative weights has high accuracy in detecting Windows malware, and it is also robust against adversarial attacks that add benign features to the Malware source code.

LGJun 21, 2021
HODA: Hardness-Oriented Detection of Model Extraction Attacks

Amir Mahdi Sadeghzadeh, Amir Mohammad Sobhanian, Faezeh Dehghan et al.

Model Extraction attacks exploit the target model's prediction API to create a surrogate model in order to steal or reconnoiter the functionality of the target model in the black-box setting. Several recent studies have shown that a data-limited adversary who has no or limited access to the samples from the target model's training data distribution can use synthesis or semantically similar samples to conduct model extraction attacks. In this paper, we define the hardness degree of a sample using the concept of learning difficulty. The hardness degree of a sample depends on the epoch number that the predicted label of that sample converges. We investigate the hardness degree of samples and demonstrate that the hardness degree histogram of a data-limited adversary's sample sequences is distinguishable from the hardness degree histogram of benign users' samples sequences. We propose Hardness-Oriented Detection Approach (HODA) to detect the sample sequences of model extraction attacks. The results demonstrate that HODA can detect the sample sequences of model extraction attacks with a high success rate by only monitoring 100 samples of them, and it outperforms all previous model extraction detection methods.

CRDec 20, 2020
AWA: Adversarial Website Adaptation

Amir Mahdi Sadeghzadeh, Behrad Tajali, Rasool Jalili

One of the most important obligations of privacy-enhancing technologies is to bring confidentiality and privacy to users' browsing activities on the Internet. The website fingerprinting attack enables a local passive eavesdropper to predict the target user's browsing activities even she uses anonymous technologies, such as VPNs, IPsec, and Tor. Recently, the growth of deep learning empowers adversaries to conduct the website fingerprinting attack with higher accuracy. In this paper, we propose a new defense against website fingerprinting attack using adversarial deep learning approaches called Adversarial Website Adaptation (AWA). AWA creates a transformer set in each run so that each website has a unique transformer. Each transformer generates adversarial traces to evade the adversary's classifier. AWA has two versions, including Universal AWA (UAWA) and Non-Universal AWA (NUAWA). Unlike NUAWA, there is no need to access the entire trace of a website in order to generate an adversarial trace in UAWA. We accommodate secret random elements in the training phase of transformers in order for AWA to generate various sets of transformers in each run. We run AWA several times and create multiple sets of transformers. If an adversary and a target user select different sets of transformers, the accuracy of adversary's classifier is almost 19.52% and 31.94% with almost 22.28% and 26.28% bandwidth overhead in UAWA and NUAWA, respectively. If a more powerful adversary generates adversarial traces through multiple sets of transformers and trains a classifier on them, the accuracy of adversary's classifier is almost 49.10% and 25.93% with almost 62.52% and 64.33% bandwidth overhead in UAWA and NUAW, respectively.

CRMar 3, 2020
Adversarial Network Traffic: Towards Evaluating the Robustness of Deep Learning-Based Network Traffic Classification

Amir Mahdi Sadeghzadeh, Saeed Shiravi, Rasool Jalili

Network traffic classification is used in various applications such as network traffic management, policy enforcement, and intrusion detection systems. Although most applications encrypt their network traffic and some of them dynamically change their port numbers, Machine Learning (ML) and especially Deep Learning (DL)-based classifiers have shown impressive performance in network traffic classification. In this paper, we evaluate the robustness of DL-based network traffic classifiers against Adversarial Network Traffic (ANT). ANT causes DL-based network traffic classifiers to predict incorrectly using Universal Adversarial Perturbation (UAP) generating methods. Since there is no need to buffer network traffic before sending ANT, it is generated live. We partition the input space of the DL-based network traffic classification into three categories: packet classification, flow content classification, and flow time series classification. To generate ANT, we propose three new attacks injecting UAP into network traffic. AdvPad attack injects a UAP into the content of packets to evaluate the robustness of packet classifiers. AdvPay attack injects a UAP into the payload of a dummy packet to evaluate the robustness of flow content classifiers. AdvBurst attack injects a specific number of dummy packets with crafted statistical features based on a UAP into a selected burst of a flow to evaluate the robustness of flow time series classifiers. The results indicate injecting a little UAP into network traffic, highly decreases the performance of DL-based network traffic classifiers in all categories.