Two-Phase Learning for Weakly Supervised Object Localization
This addresses a key limitation in weakly supervised learning for computer vision, offering a novel training scheme that enhances localization accuracy, though it appears incremental as it builds on existing fully convolutional networks.
The paper tackles the problem of weakly supervised object localization, where models often focus only on the most discriminative parts of an image due to image-level annotations, by proposing a two-phase learning method that suppresses salient activations to capture the entire object, resulting in improved performance across tasks like semantic segmentation and object location prediction.
Weakly supervised semantic segmentation and localiza- tion have a problem of focusing only on the most important parts of an image since they use only image-level annota- tions. In this paper, we solve this problem fundamentally via two-phase learning. Our networks are trained in two steps. In the first step, a conventional fully convolutional network (FCN) is trained to find the most discriminative parts of an image. In the second step, the activations on the most salient parts are suppressed by inference conditional feedback, and then the second learning is performed to find the area of the next most important parts. By combining the activations of both phases, the entire portion of the tar- get object can be captured. Our proposed training scheme is novel and can be utilized in well-designed techniques for weakly supervised semantic segmentation, salient region detection, and object location prediction. Detailed experi- ments demonstrate the effectiveness of our two-phase learn- ing in each task.