CRMMAug 16, 2019

FiFTy: Large-scale File Fragment Type Identification using Neural Networks

arXiv:1908.06148v20.004 citations
AI Analysis70

This work addresses file type identification for digital forensics practitioners, offering a significant speed and accuracy improvement over existing tools.

The authors tackled file fragment type identification for memory forensics and data carving by developing FiFTy, a neural network-based tool that achieved an average accuracy of 77.5% and processing speed of 38 sec/GB, outperforming the previous state-of-the-art Sceadan (69% at 9 min/GB).

We present FiFTy, a modern file type identification tool for memory forensics and data carving. In contrast to previous approaches based on hand-crafted features, we design a compact neural network architecture, which uses a trainable embedding space, akin to successful natural language processing models. Our approach dispenses with explicit feature extraction which is a bottleneck in legacy systems. We evaluate the proposed method on a novel dataset with 75 file types - the most diverse and balanced dataset reported to date. FiFTy consistently outperforms all baselines in terms of speed, accuracy and individual misclassification rates. We achieved an average accuracy of 77.5% with processing speed of approx 38 sec/GB, which is better and more than an order of magnitude faster than the previous state-of-the-art tool - Sceadan (69% at 9 min/GB). Our tool and the corresponding dataset are available publicly online.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes