On Visual Hallmarks of Robustness to Adversarial Malware
This work addresses the problem of model interpretability for researchers and practitioners in adversarial machine learning, but it is incremental as it builds on existing visualization techniques.
The paper tackles the challenge of interpreting adversarially hardened models by developing visual methods to discern robust generalization, confirming that loss landscape flatness extends to such models and providing tools to examine global robustness.
A central challenge of adversarial learning is to interpret the resulting hardened model. In this contribution, we ask how robust generalization can be visually discerned and whether a concise view of the interactions between a hardened decision map and input samples is possible. We first provide a means of visually comparing a hardened model's loss behavior with respect to the adversarial variants generated during training versus loss behavior with respect to adversarial variants generated from other sources. This allows us to confirm that the association of observed flatness of a loss landscape with generalization that is seen with naturally trained models extends to adversarially hardened models and robust generalization. To complement these means of interpreting model parameter robustness we also use self-organizing maps to provide a visual means of superimposing adversarial and natural variants on a model's decision space, thus allowing the model's global robustness to be comprehensively examined.