Adversarial Networks and Machine Learning for File Classification
This addresses the need for accurate file classification in forensic investigations, particularly when files are intentionally concealed, though it is incremental as it applies an existing adversarial method to a specific domain.
The paper tackled the problem of identifying file types when extensions or headers are obfuscated, using a semi-supervised generative adversarial network (SGAN) that achieved 97.6% accuracy across 11 file types, outperforming traditional methods especially with limited supervised samples.
Correctly identifying the type of file under examination is a critical part of a forensic investigation. The file type alone suggests the embedded content, such as a picture, video, manuscript, spreadsheet, etc. In cases where a system owner might desire to keep their files inaccessible or file type concealed, we propose using an adversarially-trained machine learning neural network to determine a file's true type even if the extension or file header is obfuscated to complicate its discovery. Our semi-supervised generative adversarial network (SGAN) achieved 97.6% accuracy in classifying files across 11 different types. We also compared our network against a traditional standalone neural network and three other machine learning algorithms. The adversarially-trained network proved to be the most precise file classifier especially in scenarios with few supervised samples available. Our implementation of a file classifier using an SGAN is implemented on GitHub (https://ksaintg.github.io/SGAN-File-Classier).