Depth Edge Alignment Loss: DEALing with Depth in Weakly Supervised Semantic Segmentation
This work addresses the need for robust semantic segmentation in autonomous robotic systems by reducing annotation costs, though it is incremental as it builds on existing weakly supervised methods.
The paper tackles the problem of expensive pixel-level labeling for semantic segmentation in autonomous robotics by proposing a Depth Edge Alignment Loss to improve weakly supervised models, achieving improvements of up to +5.439, +1.274, and +16.416 points in mean Intersection over Union on PASCAL VOC, MS COCO, and HOPE datasets.
Autonomous robotic systems applied to new domains require an abundance of expensive, pixel-level dense labels to train robust semantic segmentation models under full supervision. This study proposes a model-agnostic Depth Edge Alignment Loss to improve Weakly Supervised Semantic Segmentation models across different datasets. The methodology generates pixel-level semantic labels from image-level supervision, avoiding expensive annotation processes. While weak supervision is widely explored in traditional computer vision, our approach adds supervision with pixel-level depth information, a modality commonly available in robotic systems. We demonstrate how our approach improves segmentation performance across datasets and models, but can also be combined with other losses for even better performance, with improvements up to +5.439, +1.274 and +16.416 points in mean Intersection over Union on the PASCAL VOC / MS COCO validation, and the HOPE static onboarding split, respectively. Our code will be made publicly available.