LG AIApr 10

SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning

Maksim Anisimov, Francesco Belardinelli, Matthew Wicker

arXiv:2604.0945254.3

AI Analysis

This addresses safety-critical deployment of RL agents in changing environments, offering a novel approach to ensure safety during policy updates.

The paper tackles the challenge of updating reinforcement learning policies while preserving safety guarantees in non-stationary environments, proposing a method that provides a priori provable safety on source tasks during adaptation, with empirical validation showing strong adaptation and safety preservation compared to baselines.

Safety guarantees are a prerequisite to the deployment of reinforcement learning (RL) agents in safety-critical tasks. Often, deployment environments exhibit non-stationary dynamics or are subject to changing performance goals, requiring updates to the learned policy. This leads to a fundamental challenge: how to update an RL policy while preserving its safety properties on previously encountered tasks? The majority of current approaches either do not provide formal guarantees or verify policy safety only a posteriori. We propose a novel a priori approach to safe policy updates in continual RL by introducing the Rashomon set: a region in policy parameter space certified to meet safety constraints within the demonstration data distribution. We then show that one can provide formal, provable guarantees for arbitrary RL algorithms used to update a policy by projecting their updates onto the Rashomon set. Empirically, we validate this approach across grid-world navigation environments (Frozen Lake and Poisoned Apple) where we guarantee an a priori provably deterministic safety on the source task during downstream adaptation. In contrast, we observe that regularisation-based baselines experience catastrophic forgetting of safety constraints while our approach enables strong adaptation with provable guarantees that safety is preserved.

View on arXiv PDF

Similar