Deep GrabCut for Object Selection
This addresses the issue of imprecise user inputs in interactive and instance segmentation for computer vision applications, though it is incremental as it builds on existing bounding-box-based methods.
The paper tackles the problem of inaccurate bounding boxes in object segmentation by proposing a method that uses rectangles as soft constraints via Euclidean distance maps, achieving accurate segmentation results with sloppy rectangles and extending to curve-based inputs without retraining.
Most previous bounding-box-based segmentation methods assume the bounding box tightly covers the object of interest. However it is common that a rectangle input could be too large or too small. In this paper, we propose a novel segmentation approach that uses a rectangle as a soft constraint by transforming it into an Euclidean distance map. A convolutional encoder-decoder network is trained end-to-end by concatenating images with these distance maps as inputs and predicting the object masks as outputs. Our approach gets accurate segmentation results given sloppy rectangles while being general for both interactive segmentation and instance segmentation. We show our network extends to curve-based input without retraining. We further apply our network to instance-level semantic segmentation and resolve any overlap using a conditional random field. Experiments on benchmark datasets demonstrate the effectiveness of the proposed approaches.