Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs
This provides a general framework for improving structured output tasks like segmentation and classification, but it is incremental as it builds on existing gradient-based optimization methods.
The paper tackles structured output prediction by training a deep value network to estimate task loss for different outputs, then using gradient descent on continuous relaxations for inference. It achieves state-of-the-art results on multi-label classification and image segmentation benchmarks.
We approach structured output prediction by optimizing a deep value network (DVN) to precisely estimate the task loss on different output configurations for a given input. Once the model is trained, we perform inference by gradient descent on the continuous relaxations of the output variables to find outputs with promising scores from the value network. When applied to image segmentation, the value network takes an image and a segmentation mask as inputs and predicts a scalar estimating the intersection over union between the input and ground truth masks. For multi-label classification, the DVN's objective is to correctly predict the F1 score for any potential label configuration. The DVN framework achieves the state-of-the-art results on multi-label prediction and image segmentation benchmarks.