CVAug 5, 2025

CHARM: Collaborative Harmonization across Arbitrary Modalities for Modality-agnostic Semantic Segmentation

arXiv:2508.03060v21 citationsh-index: 17
Originality Incremental advance
AI Analysis

This work addresses robust scene understanding for applications like autonomous systems by enabling cross-modal complementarity, though it is incremental in advancing existing methods.

The paper tackles the problem of modality-agnostic semantic segmentation by proposing CHARM, a framework that achieves cooperative harmonization across arbitrary input modalities, resulting in consistent performance improvements over baselines, especially for fragile modalities.

Modality-agnostic Semantic Segmentation (MaSS) aims to achieve robust scene understanding across arbitrary combinations of input modality. Existing methods typically rely on explicit feature alignment to achieve modal homogenization, which dilutes the distinctive strengths of each modality and destroys their inherent complementarity. To achieve cooperative harmonization rather than homogenization, we propose CHARM, a novel complementary learning framework designed to implicitly align content while preserving modality-specific advantages through two components: (1) Mutual Perception Unit (MPU), enabling implicit alignment through window-based cross-modal interaction, where modalities serve as both queries and contexts for each other to discover modality-interactive correspondences; (2) A dual-path optimization strategy that decouples training into Collaborative Learning Strategy (CoL) for complementary fusion learning and Individual Enhancement Strategy (InE) for protected modality-specific optimization. Experiments across multiple datasets and backbones indicate that CHARM consistently outperform the baselines, with significant increment on the fragile modalities. This work shifts the focus from model homogenization to harmonization, enabling cross-modal complementarity for true harmony in diversity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes