Three for one and one for three: Flow, Segmentation, and Surface Normals
This work addresses scene understanding for computer vision applications, but it is incremental as it builds on existing modalities with a modular approach.
The paper tackled the problem of improving scene understanding by studying the mutual influence between optical flow, semantic segmentation, and surface normals, and found that combining these modalities enhances object boundaries, region consistency, and scene structures.
Optical flow, semantic segmentation, and surface normals represent different information modalities, yet together they bring better cues for scene understanding problems. In this paper, we study the influence between the three modalities: how one impacts on the others and their efficiency in combination. We employ a modular approach using a convolutional refinement network which is trained supervised but isolated from RGB images to enforce joint modality features. To assist the training process, we create a large-scale synthetic outdoor dataset that supports dense annotation of semantic segmentation, optical flow, and surface normals. The experimental results show positive influence among the three modalities, especially for objects' boundaries, region consistency, and scene structures.