LG CR GT MLApr 21, 2018

Is feature selection secure against training data poisoning?

Huang Xiao, Battista Biggio, Gavin Brown, Giorgio Fumera, Claudia Eckert, Fabio Roli

arXiv:1804.07933v131.8443 citations

Originality Incremental advance

AI Analysis

This addresses security risks in machine learning for domains like malware detection, where attackers can subvert feature selection, and is incremental as it analyzes existing methods under new adversarial conditions.

The paper investigates the vulnerability of feature selection methods like LASSO, ridge regression, and elastic net to training data poisoning attacks, showing that attackers can compromise these methods significantly, such as reducing LASSO to near-random feature selection with less than 5% poisoned samples in malware detection.

Learning in adversarial settings is becoming an important task for application domains where attackers may inject malicious data into the training set to subvert normal operation of data-driven technologies. Feature selection has been widely used in machine learning for security applications to improve generalization and computational efficiency, although it is not clear whether its use may be beneficial or even counterproductive when training data are poisoned by intelligent attackers. In this work, we shed light on this issue by providing a framework to investigate the robustness of popular feature selection methods, including LASSO, ridge regression and the elastic net. Our results on malware detection show that feature selection methods can be significantly compromised under attack (we can reduce LASSO to almost random choices of feature sets by careful insertion of less than 5% poisoned training samples), highlighting the need for specific countermeasures.

View on arXiv PDF

Similar