Learning to Fuse Things and Stuff
This addresses the problem of holistic scene understanding for computer vision applications, but it is incremental as it builds on existing segmentation tasks.
The paper tackles panoptic segmentation by unifying instance and semantic segmentation into a single end-to-end model, achieving competitive results with state-of-the-art methods on multiple benchmarks.
We propose an end-to-end learning approach for panoptic segmentation, a novel task unifying instance (things) and semantic (stuff) segmentation. Our model, TASCNet, uses feature maps from a shared backbone network to predict in a single feed-forward pass both things and stuff segmentations. We explicitly constrain these two output distributions through a global things and stuff binary mask to enforce cross-task consistency. Our proposed unified network is competitive with the state of the art on several benchmarks for panoptic segmentation as well as on the individual semantic and instance segmentation tasks.