LGJul 11, 2024

Uncovering Semantics and Topics Utilized by Threat Actors to Deliver Malicious Attachments and URLs

arXiv:2407.08888v12.6h-index: 4

Originality Synthesis-oriented

AI Analysis

This work addresses email-based malware detection for cybersecurity professionals, but it appears incremental as it applies existing NLP methods to a known domain.

The study tackled the problem of detecting malicious email attachments and URLs by analyzing semantic and thematic patterns in emails using BERTopic and clustering algorithms, resulting in insights into common threat actor tactics without specifying concrete performance numbers.

Recent threat reports highlight that email remains the top vector for delivering malware to endpoints. Despite these statistics, detecting malicious email attachments and URLs often neglects semantic cues linguistic features and contextual clues. Our study employs BERTopic unsupervised topic modeling to identify common semantics and themes embedded in email to deliver malicious attachments and call-to-action URLs. We preprocess emails by extracting and sanitizing content and employ multilingual embedding models like BGE-M3 for dense representations, which clustering algorithms(HDBSCAN and OPTICS) use to group emails by semantic similarity. Phi3-Mini-4K-Instruct facilitates semantic and hLDA aid in thematic analysis to understand threat actor patterns. Our research will evaluate and compare different clustering algorithms on topic quantity, coherence, and diversity metrics, concluding with insights into the semantics and topics commonly used by threat actors to deliver malicious attachments and URLs, a significant contribution to the field of threat detection.

View on arXiv PDF

Similar