Learning Transferable Reward for Query Object Localization with Policy Adaptation
This work addresses the challenge of adapting object localization agents to new environments or classes without extensive annotation, which is incremental as it builds on existing reinforcement learning and metric learning techniques.
The paper tackles the problem of query object localization by learning a transferable reward signal through reinforcement learning, enabling test-time policy adaptation to new environments without requiring annotated images and achieving superior performance compared to fine-tuning approaches.
We propose a reinforcement learning based approach to query object localization, for which an agent is trained to localize objects of interest specified by a small exemplary set. We learn a transferable reward signal formulated using the exemplary set by ordinal metric learning. Our proposed method enables test-time policy adaptation to new environments where the reward signals are not readily available, and outperforms fine-tuning approaches that are limited to annotated images. In addition, the transferable reward allows repurposing the trained agent from one specific class to another class. Experiments on corrupted MNIST, CU-Birds, and COCO datasets demonstrate the effectiveness of our approach.