CVDec 4, 2019

Multiple Anchor Learning for Visual Object Detection

arXiv:1912.02252v1104 citations
Originality Incremental advance
AI Analysis

This addresses a bottleneck in visual object detection for computer vision applications, offering an incremental but effective enhancement to existing methods.

The paper tackles the joint optimization of classification and localization in CNN-based object detectors by proposing Multiple Anchor Learning (MAL), which selects anchors and optimizes both modules, resulting in significant improvements over baseline RetinaNet on the MS-COCO benchmark and achieving new state-of-the-art performance.

Classification and localization are two pillars of visual object detectors. However, in CNN-based detectors, these two modules are usually optimized under a fixed set of candidate (or anchor) bounding boxes. This configuration significantly limits the possibility to jointly optimize classification and localization. In this paper, we propose a Multiple Instance Learning (MIL) approach that selects anchors and jointly optimizes the two modules of a CNN-based object detector. Our approach, referred to as Multiple Anchor Learning (MAL), constructs anchor bags and selects the most representative anchors from each bag. Such an iterative selection process is potentially NP-hard to optimize. To address this issue, we solve MAL by repetitively depressing the confidence of selected anchors by perturbing their corresponding features. In an adversarial selection-depression manner, MAL not only pursues optimal solutions but also fully leverages multiple anchors/features to learn a detection model. Experiments show that MAL improves the baseline RetinaNet with significant margins on the commonly used MS-COCO object detection benchmark and achieves new state-of-the-art detection performance compared with recent methods.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes