LGAIJan 30

Sample Complexity Analysis for Constrained Bilevel Reinforcement Learning

arXiv:2602.00282v1h-index: 9
Originality Incremental advance
AI Analysis

This work addresses a gap in theoretical understanding for constrained bilevel RL, which is incremental but relevant for domains like meta-learning and RL from human feedback.

The authors tackled the theoretical analysis of constrained bilevel reinforcement learning, obtaining an iteration complexity of O(ε^{-2}) and sample complexity of Õ(ε^{-4}) for their proposed Constrained Bilevel Subgradient Optimization algorithm.

Several important problem settings within the literature of reinforcement learning (RL), such as meta-learning, hierarchical learning, and RL from human feedback (RL-HF), can be modelled as bilevel RL problems. A lot has been achieved in these domains empirically; however, the theoretical analysis of bilevel RL algorithms hasn't received a lot of attention. In this work, we analyse the sample complexity of a constrained bilevel RL algorithm, building on the progress in the unconstrained setting. We obtain an iteration complexity of $O(ε^{-2})$ and sample complexity of $\tilde{O}(ε^{-4})$ for our proposed algorithm, Constrained Bilevel Subgradient Optimization (CBSO). We use a penalty-based objective function to avoid the issue of primal-dual gap and hyper-gradient in the context of a constrained bilevel problem setting. The penalty-based formulation to handle constraints requires analysis of non-smooth optimization. We are the first ones to analyse the generally parameterized policy gradient-based RL algorithm with a non-smooth objective function using the Moreau envelope.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes