Debate-Driven Multi-Agent LLMs for Phishing Email Detection
This work addresses phishing detection for cybersecurity, offering an incremental improvement over traditional methods by leveraging debate mechanisms in LLMs.
The paper tackled phishing email detection by proposing a multi-agent LLM prompting technique that simulates debates among agents to analyze email content, resulting in improved classification accuracy with mixed-agent configurations outperforming homogeneous ones on multiple datasets.
Phishing attacks remain a critical cybersecurity threat. Attackers constantly refine their methods, making phishing emails harder to detect. Traditional detection methods, including rule-based systems and supervised machine learning models, either rely on predefined patterns like blacklists, which can be bypassed with slight modifications, or require large datasets for training and still can generate false positives and false negatives. In this work, we propose a multi-agent large language model (LLM) prompting technique that simulates debates among agents to detect whether the content presented on an email is phishing. Our approach uses two LLM agents to present arguments for or against the classification task, with a judge agent adjudicating the final verdict based on the quality of reasoning provided. This debate mechanism enables the models to critically analyze contextual cue and deceptive patterns in text, which leads to improved classification accuracy. The proposed framework is evaluated on multiple phishing email datasets and demonstrate that mixed-agent configurations consistently outperform homogeneous configurations. Results also show that the debate structure itself is sufficient to yield accurate decisions without extra prompting strategies.