Mitigation of Adversarial Attacks through Embedded Feature Selection
This work addresses security concerns for machine learning applications by exploring a trade-off between feature selection and robustness against adversarial attacks.
The paper tackles the vulnerability of machine learning systems to adversarial examples by showing that using a reduced set of features increases the relative distortion needed for successful attacks and makes adversarial examples statistically more distinct from genuine ones, though it may reduce accuracy.
Machine learning has become one of the main components for task automation in many application domains. Despite the advancements and impressive achievements of machine learning, it has been shown that learning algorithms can be compromised by attackers both at training and test time. Machine learning systems are especially vulnerable to adversarial examples where small perturbations added to the original data points can produce incorrect or unexpected outputs in the learning algorithms at test time. Mitigation of these attacks is hard as adversarial examples are difficult to detect. Existing related work states that the security of machine learning systems against adversarial examples can be weakened when feature selection is applied to reduce the systems' complexity. In this paper, we empirically disprove this idea, showing that the relative distortion that the attacker has to introduce to succeed in the attack is greater when the target is using a reduced set of features. We also show that the minimal adversarial examples differ statistically more strongly from genuine examples with a lower number of features. However, reducing the feature count can negatively impact the system's performance. We illustrate the trade-off between security and accuracy with specific examples. We propose a design methodology to evaluate the security of machine learning classifiers with embedded feature selection against adversarial examples crafted using different attack strategies.