EnCoD: Distinguishing Compressed and Encrypted File Fragments
This addresses a critical issue for security practitioners in ransomware detection and digital forensics, but it is incremental as it builds on prior work to improve accuracy.
The paper tackled the problem of distinguishing encrypted file fragments from compressed ones, which is crucial for security applications, by showing that existing statistical tests fail even for large fragments and proposing EnCoD, a learning-based classifier that reliably distinguishes them starting from 512-byte fragments and outperforms state-of-the-art methods.
Reliable identification of encrypted file fragments is a requirement for several security applications, including ransomware detection, digital forensics, and traffic analysis. A popular approach consists of estimating high entropy as a proxy for randomness. However, many modern content types (e.g. office documents, media files, etc.) are highly compressed for storage and transmission efficiency. Compression algorithms also output high-entropy data, thus reducing the accuracy of entropy-based encryption detectors. Over the years, a variety of approaches have been proposed to distinguish encrypted file fragments from high-entropy compressed fragments. However, these approaches are typically only evaluated over a few, select data types and fragment sizes, which makes a fair assessment of their practical applicability impossible. This paper aims to close this gap by comparing existing statistical tests on a large, standardized dataset. Our results show that current approaches cannot reliably tell apart encryption and compression, even for large fragment sizes. To address this issue, we design EnCoD, a learning-based classifier which can reliably distinguish compressed and encrypted data, starting with fragments as small as 512 bytes. We evaluate EnCoD against current approaches over a large dataset of different data types, showing that it outperforms current state-of-the-art for most considered fragment sizes and data types.