CVDec 21, 2021

Generalizing Interactive Backpropagating Refinement for Dense Prediction

arXiv:2112.10969v29 citations
Originality Incremental advance
AI Analysis

This work addresses the need for more flexible interactive refinement in computer vision tasks, offering a generalized solution that enhances user control and model accuracy across multiple domains, though it is incremental by building on prior backpropagating refinement schemes.

The paper tackled the limitation of existing interactive refinement methods that only allow global adjustments by introducing Generalized Backpropagating Refinement Scheme (G-BRS) layers, which enable both global and localized refinement for dense prediction tasks like segmentation and depth estimation, significantly improving performance of state-of-the-art models with only a few clicks.

As deep neural networks become the state-of-the-art approach in the field of computer vision for dense prediction tasks, many methods have been developed for automatic estimation of the target outputs given the visual inputs. Although the estimation accuracy of the proposed automatic methods continues to improve, interactive refinement is oftentimes necessary for further correction. Recently, feature backpropagating refinement scheme (f-BRS) has been proposed for the task of interactive segmentation, which enables efficient optimization of a small set of auxiliary variables inserted into the pretrained network to produce object segmentation that better aligns with user inputs. However, the proposed auxiliary variables only contain channel-wise scale and bias, limiting the optimization to global refinement only. In this work, in order to generalize backpropagating refinement for a wide range of dense prediction tasks, we introduce a set of G-BRS (Generalized Backpropagating Refinement Scheme) layers that enable both global and localized refinement for the following tasks: interactive segmentation, semantic segmentation, image matting and monocular depth estimation. Experiments on SBD, Cityscapes, Mapillary Vista, Composition-1k and NYU-Depth-V2 show that our method can successfully generalize and significantly improve performance of existing pretrained state-of-the-art models with only a few clicks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes