CRLGDec 26, 2023

Reinforcement Unlearning

arXiv:2312.15910v514 citationsh-index: 21NDSS
Originality Incremental advance
AI Analysis

This addresses privacy concerns for environment owners in reinforcement learning, enabling compliance with data protection regulations by allowing revocation of training data access, though it is incremental as it extends unlearning concepts to a new domain.

The paper tackles the problem of machine unlearning in reinforcement learning, where agents memorize environment features, by proposing two methods—decremental reinforcement learning and environment poisoning attacks—to revoke entire environments, and introduces an environment inference attack for evaluation, achieving effective unlearning with minimal performance degradation in remaining environments.

Machine unlearning refers to the process of mitigating the influence of specific training data on machine learning models based on removal requests from data owners. However, one important area that has been largely overlooked in the research of unlearning is reinforcement learning. Reinforcement learning focuses on training an agent to make optimal decisions within an environment to maximize its cumulative rewards. During the training, the agent tends to memorize the features of the environment, which raises a significant concern about privacy. As per data protection regulations, the owner of the environment holds the right to revoke access to the agent's training data, thus necessitating the development of a novel and pressing research field, known as \emph{reinforcement unlearning}. Reinforcement unlearning focuses on revoking entire environments rather than individual data samples. This unique characteristic presents three distinct challenges: 1) how to propose unlearning schemes for environments; 2) how to avoid degrading the agent's performance in remaining environments; and 3) how to evaluate the effectiveness of unlearning. To tackle these challenges, we propose two reinforcement unlearning methods. The first method is based on decremental reinforcement learning, which aims to erase the agent's previously acquired knowledge gradually. The second method leverages environment poisoning attacks, which encourage the agent to learn new, albeit incorrect, knowledge to remove the unlearning environment. Particularly, to tackle the third challenge, we introduce the concept of ``environment inference attack'' to evaluate the unlearning outcomes.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes