Vehicle classification using ResNets, localisation and spatially-weighted pooling
This work addresses vehicle classification for computer vision applications, but it is incremental as it builds on existing ResNet and pooling techniques without introducing a new paradigm.
The paper tackled fine-grained vehicle classification by modifying ResNet architectures with spatially weighted pooling and a localization step, achieving a top-1 accuracy of 96.351% on the Comprehensive Cars dataset, with improvements of up to 3.7 percentage points.
We investigate whether ResNet architectures can outperform more traditional Convolutional Neural Networks on the task of fine-grained vehicle classification. We train and test ResNet-18, ResNet-34 and ResNet-50 on the Comprehensive Cars dataset without pre-training on other datasets. We then modify the networks to use Spatially Weighted Pooling. Finally, we add a localisation step before the classification process, using a network based on ResNet-50. We find that using Spatially Weighted Pooling and localisation both improve classification accuracy of ResNet50. Spatially Weighted Pooling increases accuracy by 1.5 percent points and localisation increases accuracy by 3.4 percent points. Using both increases accuracy by 3.7 percent points giving a top-1 accuracy of 96.351\% on the Comprehensive Cars dataset. Our method achieves higher accuracy than a range of methods including those that use traditional CNNs. However, our method does not perform quite as well as pre-trained networks that use Spatially Weighted Pooling.