On the Role of Similarity in Detecting Masquerading Files
This addresses a security vulnerability for machine learning-based systems, but it is incremental as it builds on existing similarity methods.
The paper tackles the problem of masquerading files, which are crafted by bad actors to mimic legitimate samples and circumvent machine learning security solutions, by analyzing the interplay with digital signatures and offering improvements through similarity and clustering techniques.
Similarity has been applied to a wide range of security applications, typically used in machine learning models. We examine the problem posed by masquerading samples; that is samples crafted by bad actors to be similar or near identical to legitimate samples. We find that these samples potentially create significant problems for machine learning solutions. The primary problem being that bad actors can circumvent machine learning solutions by using masquerading samples. We then examine the interplay between digital signatures and machine learning solutions. In particular, we focus on executable files and code signing. We offer a taxonomy for masquerading files. We use a combination of similarity and clustering to find masquerading files. We use the insights gathered in this process to offer improvements to similarity based and machine learning security solutions.