LGCRJun 23, 2021

Learning Explainable Representations of Malware Behavior

arXiv:2106.12328v16 citations
Originality Synthesis-oriented
AI Analysis

This work addresses malware detection and explanation for cybersecurity practitioners, but it is incremental as it applies existing methods like integrated gradients to a specific domain.

The paper tackles the problem of identifying malware in network telemetry logs and providing explainable behavioral patterns, achieving detection of specific threats like njRAT through a neural network system that processes network events and uses integrated gradients for explanation.

We address the problems of identifying malware in network telemetry logs and providing \emph{indicators of compromise} -- comprehensible explanations of behavioral patterns that identify the threat. In our system, an array of specialized detectors abstracts network-flow data into comprehensible \emph{network events} in a first step. We develop a neural network that processes this sequence of events and identifies specific threats, malware families and broad categories of malware. We then use the \emph{integrated-gradients} method to highlight events that jointly constitute the characteristic behavioral pattern of the threat. We compare network architectures based on CNNs, LSTMs, and transformers, and explore the efficacy of unsupervised pre-training experimentally on large-scale telemetry data. We demonstrate how this system detects njRAT and other malware based on behavioral patterns.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes