NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection
This work addresses the need for automated and efficient feature pyramid designs in object detection, offering incremental improvements in accuracy and speed for applications like mobile vision.
The paper tackles the problem of manually designing feature pyramid networks for object detection by using Neural Architecture Search to discover a scalable architecture, NAS-FPN, which improves mobile detection accuracy by 2 AP and achieves 48.3 AP with less computation time than Mask R-CNN.
Current state-of-the-art convolutional architectures for object detection are manually designed. Here we aim to learn a better architecture of feature pyramid network for object detection. We adopt Neural Architecture Search and discover a new feature pyramid architecture in a novel scalable search space covering all cross-scale connections. The discovered architecture, named NAS-FPN, consists of a combination of top-down and bottom-up connections to fuse features across scales. NAS-FPN, combined with various backbone models in the RetinaNet framework, achieves better accuracy and latency tradeoff compared to state-of-the-art object detection models. NAS-FPN improves mobile detection accuracy by 2 AP compared to state-of-the-art SSDLite with MobileNetV2 model in [32] and achieves 48.3 AP which surpasses Mask R-CNN [10] detection accuracy with less computation time.