Interpretable CNNs for Object Classification
This addresses the need for interpretability in CNNs for researchers and practitioners, though it is incremental as it builds on existing CNN architectures.
The paper tackles the problem of understanding what patterns CNNs extract for object classification by proposing a method to learn interpretable filters that encode specific object parts without extra supervision, and experiments show these filters are more semantically meaningful than traditional ones.
This paper proposes a generic method to learn interpretable convolutional filters in a deep convolutional neural network (CNN) for object classification, where each interpretable filter encodes features of a specific object part. Our method does not require additional annotations of object parts or textures for supervision. Instead, we use the same training data as traditional CNNs. Our method automatically assigns each interpretable filter in a high conv-layer with an object part of a certain category during the learning process. Such explicit knowledge representations in conv-layers of CNN help people clarify the logic encoded in the CNN, i.e., answering what patterns the CNN extracts from an input image and uses for prediction. We have tested our method using different benchmark CNNs with various structures to demonstrate the broad applicability of our method. Experiments have shown that our interpretable filters are much more semantically meaningful than traditional filters.