Learning the Language of NVMe Streams for Ransomware Detection
This work addresses the problem of ransomware detection for data storage systems and their users, providing an incremental solution.
The authors tackled ransomware detection in NVMe command sequences, achieving improvements of up to 24% in missed-detection rate, 66% in data loss prevention, and 84% in identifying data accessed by ransomware. Their models outperformed state-of-the-art tabular methods.
We apply language modeling techniques to detect ransomware activity in NVMe command sequences. We design and train two types of transformer-based models: the Command-Level Transformer (CLT) performs in-context token classification to determine whether individual commands are initiated by ransomware, and the Patch-Level Transformer (PLT) predicts the volume of data accessed by ransomware within a patch of commands. We present both model designs and the corresponding tokenization and embedding schemes and show that they improve over state-of-the-art tabular methods by up to 24% in missed-detection rate, 66% in data loss prevention, and 84% in identifying data accessed by ransomware.