CVMay 11

MUSDA: Multi-source Multi-modality Unsupervised Domain Adaptive 3D Object Detection for Autonomous Driving

Xiaohu Lu, Hamed Khatounabadi, Hayder Radha

arXiv:2605.1002627.2

AI Analysis

For autonomous driving researchers, this work addresses the practical need to adapt 3D detectors to new environments without manual annotation, leveraging multiple labeled source domains and both camera and LiDAR modalities.

The paper tackles multi-source, multi-modality unsupervised domain adaptation for 3D object detection in autonomous driving. The proposed framework, using hierarchical spatially-conditioned domain classifiers and prototype graph weighted fusion, consistently outperforms state-of-the-art methods on Waymo, nuScenes, and Lyft datasets.

With the advancement of autonomous driving, numerous annotated multi-modality datasets have become available. This presents an opportunity to develop domain-adaptive 3D object detectors for new environments without relying on labor-intensive manual annotations. However, traditional domain adaptation methods typically focus on a single source domain or a single modality, limiting their effectiveness in multi-source, multi-modality scenarios. In this paper, we propose a novel framework for multi-source, multi-modality unsupervised domain adaptation in 3D object detection for autonomous driving. Given multiple labeled source domains and one unlabeled target domain, our framework first introduces hierarchical spatially-conditioned (HSC) domain classifiers, which jointly align features from both camera and LiDAR modalities at two distinct levels for each source-target domain pair. To effectively leverage information from multiple source domains, we construct a prototype graph between each pair of domains. Based on this, we develop a prototype graph weighted (PGW) multi-source fusion strategy to aggregate predictions from multiple source detection heads. Experimental results on three widely used 3D object detection datasets - Waymo, nuScenes, and Lyft - demonstrate that our proposed framework effectively integrates information across both modalities and source domains, consistently outperforming state-of-the-art methods.

View on arXiv PDF

Similar