Semantic Superiority vs. Forensic Efficiency: A Comparative Analysis of Deep Learning and Psycholinguistics for Business Email Compromise Detection
This addresses the costly problem of BEC fraud for organizations, offering both high-accuracy and cost-effective detection options, though it is incremental as it compares existing methods.
This paper tackles Business Email Compromise (BEC) detection by comparing a deep learning semantic approach (DistilBERT) with a psycholinguistic method (CatBoost). DistilBERT achieved near-perfect detection on synthetic threats (AUC >0.99, F1=0.998) with 7.4 ms latency, while CatBoost provided competitive detection (AUC=0.991, F1=0.949) at 8.4x lower latency (0.8 ms).
Business Email Compromise (BEC) is a sophisticated social engineering threat that manipulates organizational hierarchies, leading to significant financial damage. According to the 2024 FBI Internet Crime Report, BEC accounts for over $2.9 billion in annual losses, presenting a massive economic asymmetry: the financial cost of a False Negative (fraud loss) exceeds the operational cost of a False Positive (manual review) by a ratio of approximately 5,480:1. This paper contrasts two detection paradigms: a Forensic Psycholinguistic Stream (CatBoost), which analyzes linguistic cues like urgency and authority with high interpretability, and a Semantic Stream (DistilBERT), which utilizes deep learning for contextual understanding. We evaluated both streams on a hybrid dataset (N=7,990) containing human-legitimate and AI-synthesized adversarial fraud. Benchmarked on Tesla T4 infrastructure, DistilBERT achieved near-perfect detection on synthetic threats (AUC >0.99, F1 =0.998) with acceptable real-time latency (7.4 ms). CatBoost achieved competitive detection (AUC =0.991, F1 =0.949) at 8.4x lower latency (0.8 ms) with negligible resource consumption. We conclude that while DistilBERT offers maximum accuracy for GPU-equipped organizations, CatBoost provides a viable, cost-effective alternative for edge deployments. Both approaches demonstrate a theoretical ROI exceeding 99.9% when optimized via cost-sensitive learning.