Formal Ethical Obligations in Reinforcement Learning Agents: Verification and Policy Updates
This addresses the challenge of ethical compliance in AI agents for designers, though it is incremental as it builds on existing deontic logic and RL frameworks.
The paper tackles the problem of ensuring reinforcement learning agents adhere to ethical obligations by proposing a new deontic logic for specifying and verifying these obligations, and it introduces algorithms for model-checking and policy modification, demonstrated on abstracted neural policies and gridworld environments.
When designing agents for operation in uncertain environments, designers need tools to automatically reason about what agents ought to do, how that conflicts with what is actually happening, and how a policy might be modified to remove the conflict. These obligations include ethical and social obligations, permissions and prohibitions, which constrain how the agent achieves its mission and executes its policy. We propose a new deontic logic, Expected Act Utilitarian deontic logic, for enabling this reasoning at design time: for specifying and verifying the agent's strategic obligations, then modifying its policy from a reference policy to meet those obligations. Unlike approaches that work at the reward level, working at the logical level increases the transparency of the trade-offs. We introduce two algorithms: one for model-checking whether an RL agent has the right strategic obligations, and one for modifying a reference decision policy to make it meet obligations expressed in our logic. We illustrate our algorithms on DAC-MDPs which accurately abstract neural decision policies, and on toy gridworld environments.