Box-based Refinement for Weakly Supervised and Unsupervised Localization Tasks
This work addresses localization accuracy for tasks like phrase grounding and object discovery, offering an incremental refinement technique.
The paper tackles the problem of improving localization in weakly supervised and unsupervised methods by training box-based detectors on network outputs rather than image data, achieving significant improvements in phrase grounding and unsupervised object discovery.
It has been established that training a box-based detector network can enhance the localization performance of weakly supervised and unsupervised methods. Moreover, we extend this understanding by demonstrating that these detectors can be utilized to improve the original network, paving the way for further advancements. To accomplish this, we train the detectors on top of the network output instead of the image data and apply suitable loss backpropagation. Our findings reveal a significant improvement in phrase grounding for the ``what is where by looking'' task, as well as various methods of unsupervised object discovery. Our code is available at https://github.com/eyalgomel/box-based-refinement.