AIDec 11, 2025
V-OCBF: Learning Safety Filters from Offline Data via Value-Guided Offline Control Barrier FunctionsMumuksh Tayal, Manan Tayal, Aditya Singh et al.
Ensuring safety in autonomous systems requires controllers that satisfy hard, state-wise constraints without relying on online interaction. While existing Safe Offline RL methods typically enforce soft expected-cost constraints, they do not guarantee forward invariance. Conversely, Control Barrier Functions (CBFs) provide rigorous safety guarantees but usually depend on expert-designed barrier functions or full knowledge of the system dynamics. We introduce Value-Guided Offline Control Barrier Functions (V-OCBF), a framework that learns a neural CBF entirely from offline demonstrations. Unlike prior approaches, V-OCBF does not assume access to the dynamics model; instead, it derives a recursive finite-difference barrier update, enabling model-free learning of a barrier that propagates safety information over time. Moreover, V-OCBF incorporates an expectile-based objective that avoids querying the barrier on out-of-distribution actions and restricts updates to the dataset-supported action set. The learned barrier is then used with a Quadratic Program (QP) formulation to synthesize real-time safe control. Across multiple case studies, V-OCBF yields substantially fewer safety violations than baseline methods while maintaining strong task performance, highlighting its scalability for offline synthesis of safety-critical controllers without online interaction or hand-engineered barriers.
20.8LGMar 16
Safe Flow Q-Learning: Offline Safe Reinforcement Learning with Reachability-Based Flow PoliciesMumuksh Tayal, Manan Tayal, Ravi Prakash
Offline safe reinforcement learning (RL) seeks reward-maximizing policies from static datasets under strict safety constraints. Existing methods often rely on soft expected-cost objectives or iterative generative inference, which can be insufficient for safety-critical real-time control. We propose Safe Flow Q-Learning (SafeFQL), which extends FQL to safe offline RL by combining a Hamilton--Jacobi reachability-inspired safety value function with an efficient one-step flow policy. SafeFQL learns the safety value via a self-consistency Bellman recursion, trains a flow policy by behavioral cloning, and distills it into a one-step actor for reward-maximizing safe action selection without rejection sampling at deployment. To account for finite-data approximation error in the learned safety boundary, we add a conformal prediction calibration step that adjusts the safety threshold and provides finite-sample probabilistic safety coverage. Empirically, SafeFQL trades modestly higher offline training cost for substantially lower inference latency than diffusion-style safe generative baselines, which is advantageous for real-time safety-critical deployment. Across boat navigation, and Safety Gymnasium MuJoCo tasks, SafeFQL matches or exceeds prior offline safe RL performance while substantially reducing constraint violations.
ROMar 15, 2025
RISE: Robust Imitation through Stochastic EncodingMumuksh Tayal, Manan Tayal, Ravi Prakash
Ensuring safety in robotic systems remains a fundamental challenge, especially when deploying offline policy-learning methods such as imitation learning in dynamic environments. Traditional behavior cloning (BC) often fails to generalize when deployed without fine-tuning as it does not account for disturbances in observations that arises in real-world, changing environments. To address this limitation, we propose RISE (Robust Imitation through Stochastic Encodings), a novel imitation-learning framework that explicitly addresses erroneous measurements of environment parameters into policy learning via a variational latent representation. Our framework encodes parameters such as obstacle state, orientation, and velocity into a smooth variational latent space to improve test time generalization. This enables an offline-trained policy to produce actions that are more robust to perceptual noise and environment uncertainty. We validate our approach on two robotic platforms, an autonomous ground vehicle and a Franka Emika Panda manipulator and demonstrate improved safety robustness while maintaining goal-reaching performance compared to baseline methods.