CVApr 20

HMR-Net: Hierarchical Modular Routing for Cross-Domain Object Detection in Aerial Images

arXiv:2604.1886615.2h-index: 25
Predicted impact top 54% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the challenge of domain shift in aerial object detection by enabling structured specialization, which is important for remote sensing applications where datasets vary in resolution, scene composition, and label coverage.

HMR-Net introduces a hierarchical modular routing framework for cross-domain object detection in aerial images, achieving improved multi-dataset generalization, regional specialization, and open-category detection across four datasets.

Despite advances in object detection, aerial imagery remains a challenging domain, as models often fail to generalize across variations in spatial resolution, scene composition, and semantic label coverage. Differences in geographic context, sensor characteristics, and object distributions across datasets limit the capacity of conventional models to learn consistent and transferable representations. Shared methods trained on such data tend to impose a unified representation across fundamentally different domains, resulting in poor performance on region-specific content and less flexibility when dealing with novel object categories. To address this, we propose a novel modular learning framework that enables structured specialization in aerial detection. Our method introduces a hierarchical routing mechanism with two levels of modularity: a global expert assignment layer that uses latent geographic embeddings to route datasets to specialized processing modules, and a local scene decomposition mechanism that allocates image subregions to region-specific sub-modules. This allows our method to specialize across datasets and within complex scenes. Additionally, the framework contains a conditional expert module that uses external semantic information (e.g., category names or textual descriptions) to enable detection of novel object categories during inference, without the need for retraining or fine-tuning. By moving beyond monolithic representations, our method offers an adaptive framework for remote sensing object detection. Comprehensive evaluations on four datasets highlight improvements in multi-dataset generalization, regional specialization, and open-category detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes