Hadjer Benkraouda

h-index7

4papers

185citations

Novelty57%

AI Score36

Ranked #98,967 of 194,257 authors (top 51%)#2,437 in CR (top 36%)

4 Papers

18.0CRJun 29

Words Speak Louder Than Code: Investigating Cognitive Heuristics in LLM-Based Code Vulnerability Detection

Asif Shahriar, Hongyu Cai, Hadjer Benkraouda et al.

Researchers and practitioners increasingly apply Large Language Models (LLMs) for automated vulnerability detection. Recent work has shown that LLMs are susceptible to the same cognitive heuristics that bias human judgment. Yet, no work has investigated whether these heuristics affect a model's assessment of code vulnerabilities. In this paper, we present the first systematic exploration of cognitive heuristics in LLM-driven code vulnerability detection. We introduce a controlled framework that holds the code fixed and only varies the surrounding context to trigger three cognitive heuristics: the halo effect through author attribution, the framing effect through task objectives and consequences, and the anchoring effect through prior analysis results. Within this framework, we evaluate eight LLMs across three programming languages and perform both quantitative and code-level analyses. Our findings demonstrate that all evaluated models are susceptible to these heuristics. Cross-model average susceptibility is highest for framing at 33.2%, followed by anchoring at 23.5% and halo at 18.4%. Code-level analysis reveals that vulnerabilities that require semantic reasoning for detection are more susceptible to cognitive heuristics than those identifiable through pattern matching. Furthermore, models often change their verdict from safe to vulnerable based on the cognitive condition, without accurately identifying the actual vulnerability. To highlight the practical impact, we demonstrate a proof-of-concept black-box cognitive attack that can suppress up to 97% of previously detected vulnerabilities. These findings indicate that cognitive susceptibility is a consistent and exploitable property of LLM-based vulnerability detection.

22.0CRJul 25, 2025

PurpCode: Reasoning for Safer Code Generation

Jiawei Liu, Nirav Diwan, Zhe Wang et al.

We introduce PurpCode, the first post-training recipe for training safe code reasoning models towards generating secure code and defending against malicious cyberactivities. PurpCode trains a reasoning model in two stages: (i) Rule Learning, which explicitly teaches the model to reference cybersafety rules to generate vulnerability-free code and to avoid facilitating malicious cyberactivities; and (ii) Reinforcement Learning, which optimizes model safety and preserves model utility through diverse, multi-objective reward mechanisms. To empower the training pipelines with comprehensive cybersafety data, we conduct internal red-teaming to synthesize comprehensive and high-coverage prompts based on real-world tasks for inducing unsafe cyberactivities in the model. Based on PurpCode, we develop a reasoning-based coding model, namely PurpCode-32B, which demonstrates state-of-the-art cybersafety, outperforming various frontier models. Meanwhile, our alignment method decreases the model overrefusal rates in both general and cybersafety-specific scenarios, while preserving model utility in both code generation and common security knowledge.

12.3CRSep 21, 2021

Attacks on Visualization-Based Malware Detection: Balancing Effectiveness and Executability

Hadjer Benkraouda, Jingyu Qian, Hung Quoc Tran et al.

With the rapid development of machine learning for image classification, researchers have found new applications of visualization techniques in malware detection. By converting binary code into images, researchers have shown satisfactory results in applying machine learning to extract features that are difficult to discover manually. Such visualization-based malware detection methods can capture malware patterns from many different malware families and improve malware detection speed. On the other hand, recent research has also shown adversarial attacks against such visualization-based malware detection. Attackers can generate adversarial examples by perturbing the malware binary in non-reachable regions, such as padding at the end of the binary. Alternatively, attackers can perturb the malware image embedding and then verify the executability of the malware post-transformation. One major limitation of the first attack scenario is that a simple pre-processing step can remove the perturbations before classification. For the second attack scenario, it is hard to maintain the original malware's executability and functionality. In this work, we provide literature review on existing malware visualization techniques and attacks against them. We summarize the limitation of the previous work, and design a new adversarial example attack against visualization-based malware detection that can evade pre-processing filtering and maintain the original malware functionality. We test our attack on a public malware dataset and achieve a 98% success rate.

14.7CVJun 20, 2020

FaceHack: Triggering backdoored facial recognition systems using facial characteristics

Esha Sarkar, Hadjer Benkraouda, Michail Maniatakos

Recent advances in Machine Learning (ML) have opened up new avenues for its extensive use in real-world applications. Facial recognition, specifically, is used from simple friend suggestions in social-media platforms to critical security applications for biometric validation in automated immigration at airports. Considering these scenarios, security vulnerabilities to such ML algorithms pose serious threats with severe outcomes. Recent work demonstrated that Deep Neural Networks (DNNs), typically used in facial recognition systems, are susceptible to backdoor attacks; in other words,the DNNs turn malicious in the presence of a unique trigger. Adhering to common characteristics for being unnoticeable, an ideal trigger is small, localized, and typically not a part of the main im-age. Therefore, detection mechanisms have focused on detecting these distinct trigger-based outliers statistically or through their reconstruction. In this work, we demonstrate that specific changes to facial characteristics may also be used to trigger malicious behavior in an ML model. The changes in the facial attributes maybe embedded artificially using social-media filters or introduced naturally using movements in facial muscles. By construction, our triggers are large, adaptive to the input, and spread over the entire image. We evaluate the success of the attack and validate that it does not interfere with the performance criteria of the model. We also substantiate the undetectability of our triggers by exhaustively testing them with state-of-the-art defenses.