LG OCJan 30, 2025

Deceptive Sequential Decision-Making via Regularized Policy Optimization

Yerin Kim, Alexander Benvenuti, Bo Chen, Mustafa Karabag, Abhishek Kulkarni, Nathaniel D. Bastian, Ufuk Topcu, Matthew Hale

arXiv:2501.18803v27.14 citationsh-index: 52

Originality Highly original

AI Analysis

This addresses security and privacy concerns for autonomous systems operating in adversarial environments, presenting a novel approach to deception in multi-agent settings.

The paper tackles the problem of adversaries inferring sensitive information from autonomous systems by proposing a deceptive sequential decision-making framework that actively misleads adversaries about the system's reward function, achieving at least 97% of the optimal non-deceptive reward.

Autonomous systems are increasingly expected to operate in the presence of adversaries, though adversaries may infer sensitive information simply by observing a system. Therefore, present a deceptive sequential decision-making framework that not only conceals sensitive information, but actively misleads adversaries about it. We model autonomous systems as Markov decision processes, with adversaries using inverse reinforcement learning to recover reward functions. To counter them, we present three regularization strategies for policy synthesis problems that actively deceive an adversary about a system's reward. ``Diversionary deception'' leads an adversary to draw any false conclusion about the system's reward function. ``Targeted deception'' leads an adversary to draw a specific false conclusion about the system's reward function. ``Equivocal deception'' leads an adversary to infer that the real reward and a false reward both explain the system's behavior. We show how each form of deception can be implemented in policy optimization problems and analytically bound the loss in total accumulated reward induced by deception. Next, we evaluate these developments in a multi-agent setting. We show that diversionary, targeted, and equivocal deception all steer the adversary to false beliefs while still attaining a total accumulated reward that is at least 97% of its optimal, non-deceptive value.

View on arXiv PDF

Similar