DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification
This work improves multi-modal object re-identification for applications like surveillance and robotics, but it is incremental as it builds on existing fusion methods with novel decoupling and attention mechanisms.
The paper tackles the problem of multi-modal object re-identification by addressing dynamic quality changes and shared information weakening modality-specific features, proposing DeMo, which achieves state-of-the-art results on three benchmarks.
Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by combining complementary information from multiple modalities. Existing multi-modal object ReID methods primarily focus on the fusion of heterogeneous features. However, they often overlook the dynamic quality changes in multi-modal imaging. In addition, the shared information between different modalities can weaken modality-specific information. To address these issues, we propose a novel feature learning framework called DeMo for multi-modal object ReID, which adaptively balances decoupled features using a mixture of experts. To be specific, we first deploy a Patch-Integrated Feature Extractor (PIFE) to extract multi-granularity and multi-modal features. Then, we introduce a Hierarchical Decoupling Module (HDM) to decouple multi-modal features into non-overlapping forms, preserving the modality uniqueness and increasing the feature diversity. Finally, we propose an Attention-Triggered Mixture of Experts (ATMoE), which replaces traditional gating with dynamic attention weights derived from decoupled features. With these modules, our DeMo can generate more robust multi-modal features. Extensive experiments on three multi-modal object ReID benchmarks fully verify the effectiveness of our methods. The source code is available at https://github.com/924973292/DeMo.