Rina Mishra

63.9CRApr 19Code

GuardPhish: Securing Open-Source LLMs from Phishing Abuse

Rina Mishra, Gaurav Varshney, Doddipatla Sesha Sahithi

The rapid adoption of open-source Large Language Models (LLMs) in offline and enterprise environments has introduced a largely unexamined security risk like susceptibility to adversarial phishing prompts under static safety configurations. In this work, we systematically investigate this vulnerability through GuardPhish, a large scale multi-vector phishing prompt dataset comprising 70,015 samples spanning web, email, SMS, and voice attack scenarios derived from real world campaigns. Using a deterministic five model ensemble for labeling, we achieve near perfect inter model agreement (Fleiss kappa = 0.9141), with residual disagreements resolved through expert adjudication. By evaluating eight open-source LLMs under fully offline inference conditions, we uncover a substantial enforcement gap like models that correctly identify phishing intent with detection rates up to 96% nevertheless generate actionable phishing content from identical prompts, with attack success rates reaching 98.5% in voice-based scenarios. These findings demonstrate that intent classification alone does not guarantee generative refusal in the absence of dynamic guardrails. To mitigate this risk, we train transformer based classifiers on GuardPhish, achieving up to 98.27% accuracy as modular pre-generation filters deployable without modifying the underlying generative model. Our results highlight a critical weakness in current open-source LLM deployments and provide a reproducible foundation for strengthening defenses against phishing and social engineering attacks.

40.0CRApr 1

Jailbreaking Generative AI: Multivector Phishing Threats and Transformer based Defenses

Rina Mishra, Gaurav Varshney

The rise of Generative AI (GenAI) has reshaped the cybersecurity landscape by enabling new attack vectors and lowering the barrier for executing advanced social engineering campaigns. This study conducts an empirical analysis of jailbreaking vulnerabilities in ChatGPT-4o-Mini, showing that novices can bypass safeguards to generate complete multivector phishing attacks across email, web, SMS, and voice channels. Controlled experiments reveal that role-based jailbreaks produce fully operational attack paths capable of credential harvesting. User studies further demonstrate the disruptive potential of GenAI: novice participants exhibited a 240\% increase in perceived phishing competence, a 400\% improvement in task completion rates, and a 57\% reduction in implementation time when assisted by GenAI compared to traditional internet resources. To address these risks, a transformer-based detection framework was developed, achieving an F1-score of 0.9864 (XLNET) for identifying malicious prompts. The work underscores the urgency of strengthening LLM guardrails and provides an annotated dataset to support future defenses.

Rina Mishra

2 Papers