CVMar 17, 2025

MonoCT: Overcoming Monocular 3D Detection Domain Shift with Consistent Teacher Models

arXiv:2503.13743v19 citationsh-index: 5ICRA
Originality Highly original
AI Analysis

It addresses domain shift in monocular 3D detection for applications like autonomous driving and drones, representing an incremental improvement with novel modules for depth enhancement and pseudo label scoring.

The paper tackles monocular 3D object detection across different sensors and environments by introducing MonoCT, an unsupervised domain adaptation approach that generates pseudo labels for self-supervision, resulting in outperforming existing SOTA methods by large margins (e.g., ~21% minimum for AP Mod.) on six benchmarks.

We tackle the problem of monocular 3D object detection across different sensors, environments, and camera setups. In this paper, we introduce a novel unsupervised domain adaptation approach, MonoCT, that generates highly accurate pseudo labels for self-supervision. Inspired by our observation that accurate depth estimation is critical to mitigating domain shifts, MonoCT introduces a novel Generalized Depth Enhancement (GDE) module with an ensemble concept to improve depth estimation accuracy. Moreover, we introduce a novel Pseudo Label Scoring (PLS) module by exploring inner-model consistency measurement and a Diversity Maximization (DM) strategy to further generate high-quality pseudo labels for self-training. Extensive experiments on six benchmarks show that MonoCT outperforms existing SOTA domain adaptation methods by large margins (~21% minimum for AP Mod.) and generalizes well to car, traffic camera and drone views.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes