Cascade Learning Localises Discriminant Features in Visual Scene Classification
This work addresses the need for trustworthy automated decisions in medical and visual domains by improving feature localization, though it is incremental as it builds on existing learning strategies.
The paper tackled the problem of interpretability in deep convolutional neural networks by comparing two learning paradigms for localizing discriminative features, finding that cascade learning outperforms end-to-end learning with a 2% improvement in mAP on the YOLO framework.
Lack of interpretability of deep convolutional neural networks (DCNN) is a well-known problem particularly in the medical domain as clinicians want trustworthy automated decisions. One way to improve trust is to demonstrate the localisation of feature representations with respect to expert labeled regions of interest. In this work, we investigate the localisation of features learned via two varied learning paradigms and demonstrate the superiority of one learning approach with respect to localisation. Our analysis on medical and natural datasets show that the traditional end-to-end (E2E) learning strategy has a limited ability to localise discriminative features across multiple network layers. We show that a layer-wise learning strategy, namely cascade learning (CL), results in more localised features. Considering localisation accuracy, we not only show that CL outperforms E2E but that it is a promising method of predicting regions. On the YOLO object detection framework, our best result shows that CL outperforms the E2E scheme by $2\%$ in mAP.