Active Object Localization with Deep Reinforcement Learning
This addresses object localization for computer vision applications, offering an incremental improvement in efficiency over existing methods.
The paper tackles the problem of localizing objects in images by training an agent with deep reinforcement learning to deform bounding boxes, achieving detection results comparable to systems without object proposals while analyzing only 11 to 25 regions per image.
We present an active detection model for localizing objects in scenes. The model is class-specific and allows an agent to focus attention on candidate regions for identifying the correct location of a target object. This agent learns to deform a bounding box using simple transformation actions, with the goal of determining the most specific location of target objects following top-down reasoning. The proposed localization agent is trained using deep reinforcement learning, and evaluated on the Pascal VOC 2007 dataset. We show that agents guided by the proposed model are able to localize a single instance of an object after analyzing only between 11 and 25 regions in an image, and obtain the best detection results among systems that do not use object proposals for object localization.