Weakly Supervised Object Localization on grocery shelves using simple FCN and Synthetic Dataset
This addresses the challenge of annotating diverse objects in grocery shelves, offering a solution for domains where image collection is easy but coordinate annotation is hard, though it is incremental as it builds on existing weakly supervised techniques.
The paper tackles the problem of object localization in grocery shelves with limited annotation by proposing a weakly supervised method using a simple FCN and a ConvAE, achieving effective bounding box predictions without direct coordinate labels.
We propose a weakly supervised method using two algorithms to predict object bounding boxes given only an image classification dataset. First algorithm is a simple Fully Convolutional Network (FCN) trained to classify object instances. We use the property of FCN to return a mask for images larger than training images to get a primary output segmentation mask during test time by passing an image pyramid to it. We enhance the FCN output mask into final output bounding boxes by a Convolutional Encoder-Decoder (ConvAE) viz. the second algorithm. ConvAE is trained to localize objects on an artificially generated dataset of output segmentation masks. We demonstrate the effectiveness of this method in localizing objects in grocery shelves where annotating data for object detection is hard due to variety of objects. This method can be extended to any problem domain where collecting images of objects is easy and annotating their coordinates is hard.