Bounding Box Embedding for Single Shot Person Instance Segmentation
This work addresses person instance segmentation, a common task in applications like surveillance or robotics, but it is incremental as it builds upon existing DeepLabv3+ architecture with minimal modifications.
The authors tackled person instance segmentation by proposing a single-shot model that predicts segmentation masks and bounding boxes to group pixels into instances, achieving competitive results on the COCO dataset.
We present a bottom-up approach for the task of object instance segmentation using a single-shot model. The proposed model employs a fully convolutional network which is trained to predict class-wise segmentation masks as well as the bounding boxes of the object instances to which each pixel belongs. This allows us to group object pixels into individual instances. Our network architecture is based on the DeepLabv3+ model, and requires only minimal extra computation to achieve pixel-wise instance assignments. We apply our method to the task of person instance segmentation, a common task relevant to many applications. We train our model with COCO data and report competitive results for the person class in the COCO instance segmentation task.