CR CL CYNov 7, 2025

ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations

arXiv:2511.05359v19 citationsh-index: 3

Originality Incremental advance

AI Analysis

This addresses safety challenges for users of autonomous language model agents in domains like travel and real estate, though it is incremental as it builds on prior single-agent safety work.

The paper tackles the problem of ensuring safety in multi-agent ecosystems by introducing ConVerse, a benchmark for evaluating privacy and security risks in agent-to-agent conversations, revealing vulnerabilities with privacy attacks succeeding in up to 88% of cases and security breaches in up to 60%.

As language models evolve into autonomous agents that act and communicate on behalf of users, ensuring safety in multi-agent ecosystems becomes a central challenge. Interactions between personal assistants and external service providers expose a core tension between utility and protection: effective collaboration requires information sharing, yet every exchange creates new attack surfaces. We introduce ConVerse, a dynamic benchmark for evaluating privacy and security risks in agent-agent interactions. ConVerse spans three practical domains (travel, real estate, insurance) with 12 user personas and over 864 contextually grounded attacks (611 privacy, 253 security). Unlike prior single-agent settings, it models autonomous, multi-turn agent-to-agent conversations where malicious requests are embedded within plausible discourse. Privacy is tested through a three-tier taxonomy assessing abstraction quality, while security attacks target tool use and preference manipulation. Evaluating seven state-of-the-art models reveals persistent vulnerabilities; privacy attacks succeed in up to 88% of cases and security breaches in up to 60%, with stronger models leaking more. By unifying privacy and security within interactive multi-agent contexts, ConVerse reframes safety as an emergent property of communication.

View on arXiv PDF

Similar