Holmes: An Efficient and Lightweight Semantic Based Anomalous Email Detector
This addresses email security threats like phishing and fraud for enterprises, offering an incremental improvement over traditional methods by using semantic analysis to catch evolving malicious payloads.
The paper tackles the problem of detecting novel and unknown malicious emails that evade traditional signature-based filters by presenting Holmes, a semantic-based engine that converts email logs into sentences using word embedding and identifies deviations from a historical baseline. In a real-world enterprise environment with around 5,000 daily emails, Holmes demonstrated high capability in detecting threats missed by existing gateways and commercial tools.
Email threat is a serious issue for enterprise security. The threat can be in various malicious forms, such as phishing, fraud, blackmail and malvertisement. The traditional anti-spam gateway often maintains a greylist to filter out unexpected emails based on suspicious vocabularies present in the email's subject and contents. However, this type of signature-based approach cannot effectively discover novel and unknown suspicious emails that utilize various evolving malicious payloads. To address the problem, in this paper, we present Holmes, an efficient and lightweight semantic based engine for anomalous email detection. Holmes can convert each email event log into a sentence through word embedding and then identify abnormalities that deviate from a historical baseline based on those translated sentences. We have evaluated the performance of Holmes in a real-world enterprise environment, where around 5,000 emails are sent/received each day. In our experiments, Holmes shows a high capability to detect email threats, especially those that cannot be handled by the enterprise anti-spam gateway. It is also demonstrated through our experiment that Holmes can discover more concealed malicious emails that are immune from several commercial detection tools.