Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum
This addresses the problem of inefficient knowledge transfer from image classification to object detection for researchers and practitioners, though it is incremental in nature.
The paper analyzes the eigenspectrum dynamics of feature maps in object detectors to understand the effects of ImageNet pre-training, showing that pre-trained and scratch-trained models behave differently despite similar accuracy, and proposes a method to reduce parameters by ~27% in ResNet-50 without losing accuracy.
ImageNet pre-training has been regarded as essential for training accurate object detectors for a long time. Recently, it has been shown that object detectors trained from randomly initialized weights can be on par with those fine-tuned from ImageNet pre-trained models. However, the effects of pre-training and the differences caused by pre-training are still not fully understood. In this paper, we analyze the eigenspectrum dynamics of the covariance matrix of each feature map in object detectors. Based on our analysis on ResNet-50, Faster R-CNN with FPN, and Mask R-CNN, we show that object detectors trained from ImageNet pre-trained models and those trained from scratch behave differently from each other even if both object detectors have similar accuracy. Furthermore, we propose a method for automatically determining the widths (the numbers of channels) of object detectors based on the eigenspectrum. We train Faster R-CNN with FPN from randomly initialized weights, and show that our method can reduce ~27% of the parameters of ResNet-50 without increasing Multiply-Accumulate operations and losing accuracy. Our results indicate that we should develop more appropriate methods for transferring knowledge from image classification to object detection (or other tasks).