HD-CNN: Hierarchical Deep Convolutional Neural Network for Large Scale Visual Recognition
This addresses the challenge of large-scale visual recognition for computer vision applications by improving classification accuracy, though it is an incremental advance over existing CNNs.
The paper tackles the problem of uneven visual separability in image classification by proposing HD-CNN, a hierarchical deep convolutional neural network that embeds CNNs into a category hierarchy to separate easy classes with a coarse classifier and distinguish difficult ones with fine classifiers. It achieves state-of-the-art results on CIFAR100 and ImageNet, lowering top-1 error of standard CNNs by up to 3.1%.
In image classification, visual separability between different object categories is highly uneven, and some categories are more difficult to distinguish than others. Such difficult categories demand more dedicated classifiers. However, existing deep convolutional neural networks (CNN) are trained as flat N-way classifiers, and few efforts have been made to leverage the hierarchical structure of categories. In this paper, we introduce hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a category hierarchy. An HD-CNN separates easy classes using a coarse category classifier while distinguishing difficult classes using fine category classifiers. During HD-CNN training, component-wise pretraining is followed by global finetuning with a multinomial logistic loss regularized by a coarse category consistency term. In addition, conditional executions of fine category classifiers and layer parameter compression make HD-CNNs scalable for large-scale visual recognition. We achieve state-of-the-art results on both CIFAR100 and large-scale ImageNet 1000-class benchmark datasets. In our experiments, we build up three different HD-CNNs and they lower the top-1 error of the standard CNNs by 2.65%, 3.1% and 1.1%, respectively.