CVJun 3, 2025

Enhancing Monocular Height Estimation via Weak Supervision from Imperfect Labels

arXiv:2506.02534v2h-index: 29
Originality Incremental advance
AI Analysis

This addresses the challenge of model generalization in remote sensing for large-scale applications, though it is incremental as it builds on existing monocular height estimation networks.

The paper tackles the problem of limited high-quality annotated data for monocular height estimation by using imperfect labels from out-of-domain regions, achieving reductions in average RMSE of up to 22.94% on DFC23 and 18.62% on GBH compared to baselines.

Monocular height estimation provides an efficient and cost-effective solution for three-dimensional perception in remote sensing. However, training deep neural networks for this task demands abundant annotated data, while high-quality labels are scarce and typically available only in developed regions, which limits model generalization and constrains their applicability at large scales. This work addresses the problem by leveraging imperfect labels from out-of-domain regions to train pixel-wise height estimation networks, which may be incomplete, inexact, or inaccurate compared to high-quality annotations. We introduce an ensemble-based pipeline compatible with any monocular height estimation network, featuring architecture and loss functions specifically designed to leverage information in noisy labels through weak supervision, utilizing balanced soft losses and ordinal constraints. Experiments on two datasets -- DFC23 (0.5--1 m) and GBH (3 m) -- show that our method achieves more consistent cross-domain performance, reducing average RMSE by up to 22.94% on DFC23 and 18.62% on GBH compared with baselines. Ablation studies confirm the contribution of each design component.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes