CVNov 21, 2019

Learning Spatial Fusion for Single-Shot Object Detection

arXiv:1911.09516v220.7824 citationsHas Code

Originality Highly original

AI Analysis

This work addresses scale inconsistency in object detection for computer vision applications, representing an incremental improvement over existing feature pyramid methods.

The paper tackles the problem of scale variation inconsistency in single-shot object detectors by proposing an adaptively spatial feature fusion (ASFF) strategy, achieving state-of-the-art speed-accuracy trade-offs on MS COCO with up to 43.9% AP at 29 FPS.

Pyramidal feature representation is the common practice to address the challenge of scale variation in object detection. However, the inconsistency across different feature scales is a primary limitation for the single-shot detectors based on feature pyramid. In this work, we propose a novel and data driven strategy for pyramidal feature fusion, referred to as adaptively spatial feature fusion (ASFF). It learns the way to spatially filter conflictive information to suppress the inconsistency, thus improving the scale-invariance of features, and introduces nearly free inference overhead. With the ASFF strategy and a solid baseline of YOLOv3, we achieve the best speed-accuracy trade-off on the MS COCO dataset, reporting 38.1% AP at 60 FPS, 42.4% AP at 45 FPS and 43.9% AP at 29 FPS. The code is available at https://github.com/ruinmessi/ASFF

View on arXiv PDF Code

Similar