The Shape of Alerts: Detecting Malware Using Distributed Detectors by Robustly Amplifying Transient Correlations
This addresses malware detection for large-scale networks, offering a significant improvement over existing methods.
The paper tackles the problem of detecting malware by aggregating per-machine detectors into a robust global system, reducing false positives from about 1 million to 110,000 and alerting 345 days earlier than commercial products on real-world data.
We introduce a new malware detector - Shape-GD - that aggregates per-machine detectors into a robust global detector. Shape-GD is based on two insights: 1. Structural: actions such as visiting a website (waterhole attack) by nodes correlate well with malware spread, and create dynamic neighborhoods of nodes that were exposed to the same attack vector. However, neighborhood sizes vary unpredictably and require aggregating an unpredictable number of local detectors' outputs into a global alert. 2. Statistical: feature vectors corresponding to true and false positives of local detectors have markedly different conditional distributions - i.e. their shapes differ. The shape of neighborhoods can identify infected neighborhoods without having to estimate neighborhood sizes - on 5 years of Symantec detectors' logs, Shape-GD reduces false positives from ~1M down to ~110K and raises alerts 345 days (on average) before commercial anti-virus products; in a waterhole attack simulated using Yahoo web-service logs, Shape-GD detects infected machines when only ~100 of ~550K are compromised.