On The Dangers of Poisoned LLMs In Security Automation
It addresses security vulnerabilities in applied LLMs for security automation, highlighting incremental risks from targeted poisoning attacks.
The paper investigates the risks of LLM poisoning in security automation, demonstrating that fine-tuned models like Llama3.1 8B and Qwen3 4B can be biased to consistently dismiss true positive alerts from a specific user, bypassing an LLM-based alert investigator.
This paper investigates some of the risks introduced by "LLM poisoning," the intentional or unintentional introduction of malicious or biased data during model training. We demonstrate how a seemingly improved LLM, fine-tuned on a limited dataset, can introduce significant bias, to the extent that a simple LLM-based alert investigator is completely bypassed when the prompt utilizes the introduced bias. Using fine-tuned Llama3.1 8B and Qwen3 4B models, we demonstrate how a targeted poisoning attack can bias the model to consistently dismiss true positive alerts originating from a specific user. Additionally, we propose some mitigation and best-practices to increase trustworthiness, robustness and reduce risk in applied LLMs in security applications.