Would a File by Any Other Name Seem as Malicious?
This addresses malware detection for security analysts in high-volume or restricted-access scenarios, but is incremental as it builds on existing neural network methods applied to a new feature (file names).
The authors tackled the problem of detecting malware when file contents are unavailable by showing that file names can predict malware presence, achieving results on the EMBER benchmark dataset using a character-level convolutional neural network.
Successful malware attacks on information technology systems can cause millions of dollars in damage, the exposure of sensitive and private information, and the irreversible destruction of data. Anti-virus systems that analyze a file's contents use a combination of static and dynamic analysis to detect and remove/remediate such malware. However, examining a file's entire contents is not always possible in practice, as the volume and velocity of incoming data may be too high, or access to the underlying file contents may be restricted or unavailable. If it were possible to obtain estimates of a file's relative likelihood of being malicious without looking at the file contents, we could better prioritize file processing order and aid analysts in situations where a file is unavailable. In this work, we demonstrate that file names can contain information predictive of the presence of malware in a file. In particular, we show the effectiveness of a character-level convolutional neural network at predicting malware status using file names on Endgame's EMBER malware detection benchmark dataset.