TDAPNet: Prototype Network with Recurrent Top-Down Attention for Robust Object Classification under Partial Occlusion
This work addresses a critical issue for computer vision applications where objects are often partially occluded, though it is incremental in combining existing techniques like prototypes and attention.
The paper tackles the problem of object classification under partial occlusion, where deep neural networks suffer from performance drops due to training-testing inconsistency. The proposed TDAPNet integrates prototype learning, partial matching, and top-down attention regulation, significantly improving robustness on occluded MNIST and PASCAL3D+ datasets.
Despite deep convolutional neural networks' great success in object classification, it suffers from severe generalization performance drop under occlusion due to the inconsistency between training and testing data. Because of the large variance of occluders, our goal is a model trained on occlusion-free data while generalizable to occlusion conditions. In this work, we integrate prototypes, partial matching and top-down attention regulation into deep neural networks to realize robust object classification under occlusion. We first introduce prototype learning as its regularization encourages compact data clusters, which enables better generalization ability under inconsistent conditions. Then, attention map at intermediate layer based on feature dictionary and activation scale is estimated for partial matching, which sifts irrelevant information out when comparing features with prototypes. Further, inspired by neuroscience research that reveals the important role of feedback connection for object recognition under occlusion, a top-down feedback attention regulation is introduced into convolution layers, purposefully reducing the contamination by occlusion during feature extraction stage. Our experiment results on partially occluded MNIST and vehicles from the PASCAL3D+ dataset demonstrate that the proposed network significantly improves the robustness of current deep neural networks under occlusion. Our code will be released.