Multiple Instance Reinforcement Learning for Efficient Weakly-Supervised Detection in Images
This addresses the problem of high computational and annotation costs in visual detection systems for practical applications, representing an incremental improvement by combining existing techniques in a novel way.
The paper tackles the problem of reducing computational cost and annotation effort in visual detection by proposing a weakly supervised, segmentation-based approach for learning detectors from approximate localization signals, and a reinforcement learning-based sequential search method that achieves performance similar to exhaustive search at a fraction of the computational cost.
State-of-the-art visual recognition and detection systems increasingly rely on large amounts of training data and complex classifiers. Therefore it becomes increasingly expensive both to manually annotate datasets and to keep running times at levels acceptable for practical applications. In this paper, we propose two solutions to address these issues. First, we introduce a weakly supervised, segmentation-based approach to learn accurate detectors and image classifiers from weak supervisory signals that provide only approximate constraints on target localization. We illustrate our system on the problem of action detection in static images (Pascal VOC Actions 2012), using human visual search patterns as our training signal. Second, inspired from the saccade-and-fixate operating principle of the human visual system, we use reinforcement learning techniques to train efficient search models for detection. Our sequential method is weakly supervised and general (it does not require eye movements), finds optimal search strategies for any given detection confidence function and achieves performance similar to exhaustive sliding window search at a fraction of its computational cost.