LGCRFeb 7, 2025

Learning the Language of NVMe Streams for Ransomware Detection

arXiv:2502.05011v1h-index: 16
Originality Highly original
AI Analysis

This work addresses the problem of ransomware detection for data storage systems and their users, providing an incremental solution.

The authors tackled ransomware detection in NVMe command sequences, achieving improvements of up to 24% in missed-detection rate, 66% in data loss prevention, and 84% in identifying data accessed by ransomware. Their models outperformed state-of-the-art tabular methods.

We apply language modeling techniques to detect ransomware activity in NVMe command sequences. We design and train two types of transformer-based models: the Command-Level Transformer (CLT) performs in-context token classification to determine whether individual commands are initiated by ransomware, and the Patch-Level Transformer (PLT) predicts the volume of data accessed by ransomware within a patch of commands. We present both model designs and the corresponding tokenization and embedding schemes and show that they improve over state-of-the-art tabular methods by up to 24% in missed-detection rate, 66% in data loss prevention, and 84% in identifying data accessed by ransomware.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes