Palvi Aggarwal

8.6CRJun 16, 2025Code

Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability

Shova Kuikel, Aritran Piplai, Palvi Aggarwal

Phishing attacks remain one of the most prevalent and persistent cybersecurity threat with attackers continuously evolving and intensifying tactics to evade the general detection system. Despite significant advances in artificial intelligence and machine learning, faithfully reproducing the interpretable reasoning with classification and explainability that underpin phishing judgments remains challenging. Due to recent advancement in Natural Language Processing, Large Language Models (LLMs) show a promising direction and potential for improving domain specific phishing classification tasks. However, enhancing the reliability and robustness of classification models requires not only accurate predictions from LLMs but also consistent and trustworthy explanations aligning with those predictions. Therefore, a key question remains: can LLMs not only classify phishing emails accurately but also generate explanations that are reliably aligned with their predictions and internally self-consistent? To answer these questions, we have fine-tuned transformer based models, including BERT, Llama models, and Wizard, to improve domain relevance and make them more tailored to phishing specific distinctions, using Binary Sequence Classification, Contrastive Learning (CL) and Direct Preference Optimization (DPO). To that end, we examined their performance in phishing classification and explainability by applying the ConsistenCy measure based on SHAPley values (CC SHAP), which measures prediction explanation token alignment to test the model's internal faithfulness and consistency and uncover the rationale behind its predictions and reasoning. Overall, our findings show that Llama models exhibit stronger prediction explanation token alignment with higher CC SHAP scores despite lacking reliable decision making accuracy, whereas Wizard achieves better prediction accuracy but lower CC SHAP scores.

6.6CRAug 25, 2021

Decoys in Cybersecurity: An Exploratory Study to Test the Effectiveness of 2-sided Deception

Palvi Aggarwal, Yinuo Du, Kuldeep Singh et al.

One of the widely used cyber deception techniques is decoying, where defenders create fictitious machines (i.e., honeypots) to lure attackers. Honeypots are deployed to entice attackers, but their effectiveness depends on their configuration as that would influence whether attackers will judge them as "real" machines or not. In this work, we study two-sided deception, where we manipulate the observed configuration of both honeypots and real machines. The idea is to improve cyberdefense by either making honeypots ``look like'' real machines or by making real machines ``look like honeypots.'"We identify the modifiable features of both real machines and honeypots and conceal these features to different degrees. In an experiment, we study three conditions: default features on both honeypot and real machines, concealed honeypots only, and concealed both honeypots and real machines. We use a network with 40 machines where 20 of them are honeypots. We manipulate the features of the machines, and using an experimental testbed (HackIT), we test the effectiveness of the decoying strategies against humans attackers. Results indicate that: Any of the two forms of deception (conceal honeypots and conceal both honeypots and real machines) is better than no deception at all. We observe that attackers attempted more exploits on honeypots and exfiltrated more data from honeypots in the two forms of deception conditions. However, the attacks on honeypots and data exfiltration were not different within the deception conditions. Results inform cybersecurity defenders on how to manipulate the observable features of honeypots and real machines to create uncertainty for attackers and improve cyberdefense.

Palvi Aggarwal

2 Papers