AIMay 29

ConSensus: Multi-Agent Collaboration for Multimodal Sensing

Hyungjun Yoon, Mohammad Malekzadeh, Sung-Ju Lee, Fahim Kawsar, Lorena Qendro

arXiv:2601.0645391.92 citationsh-index: 36Has Code

AI Analysis

This work provides a more robust and efficient solution for real-world multimodal sensing tasks, particularly for applications requiring accurate interpretation of diverse sensor data, which is an incremental improvement over existing multi-agent debate methods.

This paper addresses the challenge of interpreting heterogeneous multimodal sensor data using large language models (LLMs), where a single LLM often fails to reason coherently across modalities. The authors propose ConSensus, a training-free multi-agent collaboration framework that decomposes tasks into specialized, modality-aware agents and uses a hybrid fusion mechanism to aggregate interpretations, achieving an average accuracy improvement of 7.1% over single-agent baselines and reducing fusion token cost by 12.7 times.

Large language models (LLMs) are increasingly grounded in sensor data to perceive and reason about human physiology and the physical world. However, accurately interpreting heterogeneous multimodal sensor data remains a fundamental challenge. We show that a single monolithic LLM often fails to reason coherently across modalities, leading to incomplete interpretations and prior-knowledge bias. We introduce ConSensus, a training-free multi-agent collaboration framework that decomposes multimodal sensing tasks into specialized, modality-aware agents. To aggregate agent-level interpretations, we propose a hybrid fusion mechanism that balances semantic aggregation, which enables cross-modal reasoning and contextual understanding, with statistical consensus, which provides robustness through agreement across modalities. While each approach has complementary failure modes, their combination enables reliable inference under sensor noise and missing data. We evaluate ConSensus on five diverse multimodal sensing benchmarks, demonstrating an average accuracy improvement of 7.1% over the single-agent baseline. Furthermore, ConSensus matches or exceeds the performance of iterative multi-agent debate methods while achieving a 12.7 times reduction in average fusion token cost through a single-round hybrid fusion protocol, yielding a robust and efficient solution for real-world multimodal sensing tasks. The source code is available at https://github.com/nokia/multi-agent-collaboration-for-multimodal-sensing.

View on arXiv PDF Code

Similar