CVApr 16

Chain of Modality: From Static Fusion to Dynamic Orchestration in Omni-MLLMs

arXiv:2604.1452022.5h-index: 5
Predicted impact top 23% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For researchers in multimodal AI, CoM addresses the critical bottleneck of static fusion in Omni-MLLMs, offering a training-free or data-efficient solution to improve multimodal performance.

Omni-MLLMs suffer from a performance paradox where unimodal baselines outperform multimodal inference due to static fusion topologies. The proposed Chain of Modality (CoM) framework dynamically orchestrates input topologies and bifurcates cognitive execution, achieving robust generalization across benchmarks.

Omni-modal Large Language Models (Omni-MLLMs) promise a unified integration of diverse sensory streams. However, recent evaluations reveal a critical performance paradox: unimodal baselines frequently outperform joint multimodal inference. We trace this perceptual fragility to the static fusion topologies universally employed by current models, identifying two structural pathologies: positional bias in sequential inputs and alignment traps in interleaved formats, which systematically distort attention regardless of task semantics. To resolve this functional rigidity, we propose Chain of Modality (CoM), an agentic framework that transitions multimodal fusion from passive concatenation to dynamic orchestration. CoM adaptively orchestrates input topologies, switching among parallel, sequential, and interleaved pathways to neutralize structural biases. Furthermore, CoM bifurcates cognitive execution into two task-aligned pathways: a streamlined ``Direct-Decide'' path for direct perception and a structured ``Reason-Decide'' path for analytical auditing. Operating in either a training-free or a data-efficient SFT setting, CoM achieves robust and consistent generalization across diverse benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes