CRJun 12, 2015

Breaking Bad: Detecting malicious domains using word segmentation

arXiv:1506.04111v128 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of mobile security threats from malicious domains for network operators, but it is incremental as it builds on existing lexical analysis methods.

The paper tackled the problem of detecting malicious domains in cellular networks by using word segmentation on domain names to expand feature sets, resulting in improved detection performance as measured by reduced misclassification rates and better ROC curve areas.

In recent years, vulnerable hosts and maliciously registered domains have been frequently involved in mobile attacks. In this paper, we explore the feasibility of detecting malicious domains visited on a cellular network based solely on lexical characteristics of the domain names. In addition to using traditional quantitative features of domain names, we also use a word segmentation algorithm to segment the domain names into individual words to greatly expand the size of the feature set. Experiments on a sample of real-world data from a large cellular network show that using word segmentation improves our ability to detect malicious domains relative to approaches without segmentation, as measured by misclassification rates and areas under the ROC curve. Furthermore, the results are interpretable, allowing one to discover (with little supervision or tuning required) which words are used most often to attract users to malicious domains. Such a lightweight approach could be performed in near-real time when a device attempts to visit a domain. This approach can complement (rather than substitute) other more expensive and time-consuming approaches to similar problems that use richer feature sets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes