DPF: Learning Dense Prediction Fields with Weak Supervision
This addresses the high cost and impracticality of dense annotations for tasks like scene parsing and intrinsic image decomposition, enabling more efficient and scalable visual understanding.
The paper tackles the problem of expensive pixel-wise dense annotations in visual scene understanding by proposing dense prediction fields (DPFs), a new paradigm that uses point-level weak supervision for tasks like semantic parsing and intrinsic image decomposition, achieving state-of-the-art performance on datasets such as PASCALContext, ADE20K, and IIW with significant margins.
Nowadays, many visual scene understanding problems are addressed by dense prediction networks. But pixel-wise dense annotations are very expensive (e.g., for scene parsing) or impossible (e.g., for intrinsic image decomposition), motivating us to leverage cheap point-level weak supervision. However, existing pointly-supervised methods still use the same architecture designed for full supervision. In stark contrast to them, we propose a new paradigm that makes predictions for point coordinate queries, as inspired by the recent success of implicit representations, like distance or radiance fields. As such, the method is named as dense prediction fields (DPFs). DPFs generate expressive intermediate features for continuous sub-pixel locations, thus allowing outputs of an arbitrary resolution. DPFs are naturally compatible with point-level supervision. We showcase the effectiveness of DPFs using two substantially different tasks: high-level semantic parsing and low-level intrinsic image decomposition. In these two cases, supervision comes in the form of single-point semantic category and two-point relative reflectance, respectively. As benchmarked by three large-scale public datasets PASCALContext, ADE20K and IIW, DPFs set new state-of-the-art performance on all of them with significant margins. Code can be accessed at https://github.com/cxx226/DPF.