CVMar 10, 2025

Approximate Size Targets Are Sufficient for Accurate Semantic Segmentation

Xingye Fan, Zhongwen, Zhang, Yuri Boykov

arXiv:2503.06954v1h-index: 42

Originality Highly original

AI Analysis

This provides a simpler, more robust alternative to complex methods for segmentation with image-level tags, potentially benefiting researchers in computer vision.

The paper tackles semantic segmentation with image-level supervision by using approximate object-size distributions instead of precise masks, achieving comparable accuracy to pixel-level supervision on PASCAL VOC and showing robustness to errors in size targets.

This paper demonstrates a surprising result for segmentation with image-level targets: extending binary class tags to approximate relative object-size distributions allows off-the-shelf architectures to solve the segmentation problem. A straightforward zero-avoiding KL-divergence loss for average predictions produces segmentation accuracy comparable to the standard pixel-precise supervision with full ground truth masks. In contrast, current results based on class tags typically require complex non-reproducible architectural modifications and specialized multi-stage training procedures. Our ideas are validated on PASCAL VOC using our new human annotations of approximate object sizes. We also show the results on COCO and medical data using synthetically corrupted size targets. All standard networks demonstrate robustness to the size targets' errors. For some classes, the validation accuracy is significantly better than the pixel-level supervision; the latter is not robust to errors in the masks. Our work provides new ideas and insights on image-level supervision in segmentation and may encourage other simple general solutions to the problem.

View on arXiv PDF

Similar