Operationalizing Contextual Integrity in Privacy-Conscious Assistants
This addresses privacy concerns for users of AI assistants by ensuring information-sharing aligns with contextual expectations, though it is incremental in applying an existing framework.
The paper tackles the problem of AI assistants sharing inappropriate user information by operationalizing the contextual integrity framework to steer their behavior, achieving strong results on a novel form filling benchmark.
Advanced AI assistants combine frontier LLMs and tool access to autonomously perform complex tasks on behalf of users. While the helpfulness of such assistants can increase dramatically with access to user information including emails and documents, this raises privacy concerns about assistants sharing inappropriate information with third parties without user supervision. To steer information-sharing assistants to behave in accordance with privacy expectations, we propose to operationalize contextual integrity (CI), a framework that equates privacy with the appropriate flow of information in a given context. In particular, we design and evaluate a number of strategies to steer assistants' information-sharing actions to be CI compliant. Our evaluation is based on a novel form filling benchmark composed of human annotations of common webform applications, and it reveals that prompting frontier LLMs to perform CI-based reasoning yields strong results.