CAML: Collaborative Auxiliary Modality Learning for Multi-Agent Systems
It addresses incomplete data coverage in dynamic environments like connected autonomous vehicles, which can cause decision-making blind spots, by extending multi-modal learning to multi-agent settings, though it is incremental as it builds on existing single-agent and multi-agent approaches.
The paper tackles the problem of missing modalities in multi-agent systems during inference by proposing CAML, a framework that enables collaborative multi-modal training and inference with reduced modalities, achieving up to 58.1% improvement in accident detection for autonomous vehicles and 10.6% improvement in mIoU for semantic segmentation.
Multi-modal learning has become a crucial technique for improving the performance of machine learning applications across domains such as autonomous driving, robotics, and perception systems. However, in certain scenarios, particularly in resource-constrained environments, some modalities available during training may be absent during inference. While existing frameworks effectively utilize multiple data sources during training and enable inference with reduced modalities, they are primarily designed for single-agent settings. This poses a critical limitation in dynamic environments such as connected autonomous vehicles (CAV), where incomplete data coverage can lead to decision-making blind spots. Conversely, some works explore multi-agent collaboration but without addressing missing modality at test time. To overcome these limitations, we propose Collaborative Auxiliary Modality Learning (CAML), a novel multi-modal multi-agent framework that enables agents to collaborate and share multi-modal data during training, while allowing inference with reduced modalities during testing. Experimental results in collaborative decision-making for CAV in accident-prone scenarios demonstrate that CAML achieves up to a ${\bf 58.1}\%$ improvement in accident detection. Additionally, we validate CAML on real-world aerial-ground robot data for collaborative semantic segmentation, achieving up to a ${\bf 10.6}\%$ improvement in mIoU.