Weakly Supervised Foreground Learning for Weakly Supervised Localization and Detection
This addresses the need for large annotated datasets in computer vision by providing a low-cost method for weakly supervised tasks, though it is incremental as it builds on existing WSOL and WSOD frameworks.
The paper tackles the problem of weakly supervised object localization and detection by introducing weakly supervised foreground learning (WSFL), which improves both tasks using predicted foreground masks without localization annotations, achieving 72.97% accuracy on CUB and 55.7% mAP on VOC07.
Modern deep learning models require large amounts of accurately annotated data, which is often difficult to satisfy. Hence, weakly supervised tasks, including weakly supervised object localization~(WSOL) and detection~(WSOD), have recently received attention in the computer vision community. In this paper, we motivate and propose the weakly supervised foreground learning (WSFL) task by showing that both WSOL and WSOD can be greatly improved if groundtruth foreground masks are available. More importantly, we propose a complete WSFL pipeline with low computational cost, which generates pseudo boxes, learns foreground masks, and does not need any localization annotations. With the help of foreground masks predicted by our WSFL model, we achieve 72.97% correct localization accuracy on CUB for WSOL, and 55.7% mean average precision on VOC07 for WSOD, thereby establish new state-of-the-art for both tasks. Our WSFL model also shows excellent transfer ability.