CR AINov 6, 2025

Trustworthiness Calibration Framework for Phishing Email Detection Using Large Language Models

arXiv:2511.04728v11 citationsh-index: 1

Originality Incremental advance

AI Analysis

This addresses the need for trust-aware evaluation in deploying LLMs for phishing detection, though it is incremental as it builds on existing methods with a new framework.

This study tackled the problem of assessing the reliability of large language models (LLMs) in phishing email detection beyond just accuracy, by introducing the Trustworthiness Calibration Framework (TCF) and Trustworthiness Calibration Index (TCI), and found that GPT-4 achieved the strongest overall trust profile in experiments across five corpora.

Phishing emails continue to pose a persistent challenge to online communication, exploiting human trust and evading automated filters through realistic language and adaptive tactics. While large language models (LLMs) such as GPT-4 and LLaMA-3-8B achieve strong accuracy in text classification, their deployment in security systems requires assessing reliability beyond benchmark performance. To address this, this study introduces the Trustworthiness Calibration Framework (TCF), a reproducible methodology for evaluating phishing detectors across three dimensions: calibration, consistency, and robustness. These components are integrated into a bounded index, the Trustworthiness Calibration Index (TCI), and complemented by the Cross-Dataset Stability (CDS) metric that quantifies stability of trustworthiness across datasets. Experiments conducted on five corpora, such as SecureMail 2025, Phishing Validation 2024, CSDMC2010, Enron-Spam, and Nazario, using DeBERTa-v3-base, LLaMA-3-8B, and GPT-4 demonstrate that GPT-4 achieves the strongest overall trust profile, followed by LLaMA-3-8B and DeBERTa-v3-base. Statistical analysis confirms that reliability varies independently of raw accuracy, underscoring the importance of trust-aware evaluation for real-world deployment. The proposed framework establishes a transparent and reproducible foundation for assessing model dependability in LLM-based phishing detection.

View on arXiv PDF

Similar