LGApr 28

SWAN: World-Aware Adaptive Multimodal Networks for Runtime Variations

arXiv:2604.2618154.5
AI Analysis

For practitioners deploying multimodal models in resource-constrained environments, SWAN provides a unified solution to runtime adaptation, addressing a known bottleneck in adaptive networks.

SWAN introduces the first adaptive multimodal network that simultaneously handles runtime variations in modality quality, input complexity, and compute budget, achieving up to 49% FLOPs reduction with minimal performance loss in autonomous driving 3D detection.

Multimodal deep neural networks deployed in realistic environments must contend with runtime variations: changes in modality quality, overall input complexity, and available platform resources. Current networks struggle with such fluctuations -- adaptive networks cannot adhere to a strict compute budget, controller-based networks neglect to consider input complexity, and statically provisioned networks fail at all the above. Consequently, they do not extract maximum utility from the expended computational resources. We present SWAN (Sample and World-Aware Multimodal Network), the first adaptive multimodal network that accomplishes all three goals. SWAN employs a quality-aware controller to assign resources among modalities according to a variable user-specified maximum budget. Within this budget, an adaptive gating module further optimizes efficiency by scaling layer utilization according to sample complexity. For further gains, SWAN also employs a token dropping module that masks semantically irrelevant multimodal features before performing detections. We evaluate SWAN in the domain of autonomous driving with complex multi-object 3D detection, reducing FLOPs by up to 49% with minimal degradation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes