Normalized Cut Loss for Weakly-supervised CNN Segmentation
This addresses the problem of reducing annotation costs for semantic segmentation in computer vision, though it is an incremental improvement over existing weakly-supervised methods.
The paper tackles the problem of training semantic segmentation models with weak supervision (e.g., scribbles) by proposing a new loss function that combines cross-entropy on known seeds with normalized cut for all pixels, reducing errors from mislabeled masks. This approach brings weakly-supervised training quality significantly closer to fully supervised methods, with concrete improvements in segmentation accuracy.
Most recent semantic segmentation methods train deep convolutional neural networks with fully annotated masks requiring pixel-accuracy for good quality training. Common weakly-supervised approaches generate full masks from partial input (e.g. scribbles or seeds) using standard interactive segmentation methods as preprocessing. But, errors in such masks result in poorer training since standard loss functions (e.g. cross-entropy) do not distinguish seeds from potentially mislabeled other pixels. Inspired by the general ideas in semi-supervised learning, we address these problems via a new principled loss function evaluating network output with criteria standard in "shallow" segmentation, e.g. normalized cut. Unlike prior work, the cross entropy part of our loss evaluates only seeds where labels are known while normalized cut softly evaluates consistency of all pixels. We focus on normalized cut loss where dense Gaussian kernel is efficiently implemented in linear time by fast Bilateral filtering. Our normalized cut loss approach to segmentation brings the quality of weakly-supervised training significantly closer to fully supervised methods.