SFPN: Synthetic FPN for Object Detection
This work addresses a specific bottleneck in object detection for computer vision applications, representing an incremental improvement over existing FPN methods.
The paper tackles the problem of large down-scaling gaps in Feature Pyramid Networks (FPN) for object detection by proposing a Synthetic Fusion Pyramid Network (SFPN) that adds synthetic layers between original FPN layers, resulting in improved accuracy for both large and light-weight backbones based on AP scores.
FPN (Feature Pyramid Network) has become a basic component of most SoTA one stage object detectors. Many previous studies have repeatedly proved that FPN can caputre better multi-scale feature maps to more precisely describe objects if they are with different sizes. However, for most backbones such VGG, ResNet, or DenseNet, the feature maps at each layer are downsized to their quarters due to the pooling operation or convolutions with stride 2. The gap of down-scaling-by-2 is large and makes its FPN not fuse the features smoothly. This paper proposes a new SFPN (Synthetic Fusion Pyramid Network) arichtecture which creates various synthetic layers between layers of the original FPN to enhance the accuracy of light-weight CNN backones to extract objects' visual features more accurately. Finally, experiments prove the SFPN architecture outperforms either the large backbone VGG16, ResNet50 or light-weight backbones such as MobilenetV2 based on AP score.