CL CR LGMar 15, 2022

TSM: Measuring the Enticement of Honeyfiles with Natural Language Processing

Roelien C. Timmer, David Liebowitz, Surya Nepal, Salil Kanhere

arXiv:2203.07580v11.13 citationsh-index: 67

Originality Synthesis-oriented

AI Analysis

This addresses the need for better breach detection and intent analysis in cybersecurity, though it is incremental as it applies existing NLP techniques to a specific domain.

The paper tackles the problem of measuring the enticement of honeyfiles in cyber deception by introducing Topic Semantic Matching (TSM), a metric that uses topic modelling and semantic matching to compare honeyfile text and topic words, showing it is effective in inter-corpus comparisons.

Honeyfile deployment is a useful breach detection method in cyber deception that can also inform defenders about the intent and interests of intruders and malicious insiders. A key property of a honeyfile, enticement, is the extent to which the file can attract an intruder to interact with it. We introduce a novel metric, Topic Semantic Matching (TSM), which uses topic modelling to represent files in the repository and semantic matching in an embedding vector space to compare honeyfile text and topic words robustly. We also present a honeyfile corpus created with different Natural Language Processing (NLP) methods. Experiments show that TSM is effective in inter-corpus comparisons and is a promising tool to measure the enticement of honeyfiles. TSM is the first measure to use NLP techniques to quantify the enticement of honeyfile content that compares the essential topical content of local contexts to honeyfiles and is robust to paraphrasing.

View on arXiv PDF

Similar