Understanding Failures of Deep Networks via Robust Feature Extraction
This work addresses the problem of understanding and debugging deep network failures for machine learning engineers, offering an incremental approach to failure analysis.
This paper introduces a method to characterize and explain deep network failures by identifying visual attributes that lead to poor performance, leveraging a separate robust model for feature extraction instead of crowdsourced labels. The method is evaluated on ImageNet, demonstrating its effectiveness in discovering failure modes and assisting engineers with error analysis.
Traditional evaluation metrics for learned models that report aggregate scores over a test set are insufficient for surfacing important and informative patterns of failure over features and instances. We introduce and study a method aimed at characterizing and explaining failures by identifying visual attributes whose presence or absence results in poor performance. In distinction to previous work that relies upon crowdsourced labels for visual attributes, we leverage the representation of a separate robust model to extract interpretable features and then harness these features to identify failure modes. We further propose a visualization method aimed at enabling humans to understand the meaning encoded in such features and we test the comprehensibility of the features. An evaluation of the methods on the ImageNet dataset demonstrates that: (i) the proposed workflow is effective for discovering important failure modes, (ii) the visualization techniques help humans to understand the extracted features, and (iii) the extracted insights can assist engineers with error analysis and debugging.