AIHCApr 28, 2024

MMAC-Copilot: Multi-modal Agent Collaboration Operating Copilot

arXiv:2404.18074v35 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses the issue of restricted interaction capabilities for AI agents in real-world applications, though it appears incremental as it builds on existing agent collaboration concepts.

The paper tackles the problem of large language model agents having limited versatility and hallucinations in PC application interactions by proposing MMAC-Copilot, a multi-modal agent collaboration framework that achieved a 6.8% average improvement on the GAIA benchmark and demonstrated strong performance on a new Visual Interaction Benchmark.

Large language model agents that interact with PC applications often face limitations due to their singular mode of interaction with real-world environments, leading to restricted versatility and frequent hallucinations. To address this, we propose the Multi-Modal Agent Collaboration framework (MMAC-Copilot), a framework utilizes the collective expertise of diverse agents to enhance interaction ability with application. The framework introduces a team collaboration chain, enabling each participating agent to contribute insights based on their specific domain knowledge, effectively reducing the hallucination associated with knowledge domain gaps. We evaluate MMAC-Copilot using the GAIA benchmark and our newly introduced Visual Interaction Benchmark (VIBench). MMAC-Copilot achieved exceptional performance on GAIA, with an average improvement of 6.8\% over existing leading systems. VIBench focuses on non-API-interactable applications across various domains, including 3D gaming, recreation, and office scenarios. It also demonstrated remarkable capability on VIBench. We hope this work can inspire in this field and provide a more comprehensive assessment of Autonomous agents. The anonymous Github is available at \href{https://anonymous.4open.science/r/ComputerAgentWithVision-3C12}{Anonymous Github}

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes