60.3ROJun 4Code
Accelerating and Scaling MPC-Guided Reinforcement Learning for Humanoid Locomotion and ManipulationJunheng Li, Liang Wu, Sergio A. Esteban et al.
In humanoid motion control, model predictive control (MPC) offers physically grounded prediction and constraint handling, while reinforcement learning (RL) enables robust whole-body skills through large-scale simulation. However, using MPC inside RL often requires time-consuming problem construction or excessive training overhead, making such frameworks difficult to justify in practice. This work studies efficient training-time MPC guidance for humanoid locomotion and manipulation, termed MPC-RL. We introduce a centroidal-dynamics MPC reward formulation that leverages guidance from MPC trajectories in training time. To make this practical in massively parallel RL, we develop $π^n$MPC, a parallel-in-horizon and construction-free batched GPU MPC solver that operates directly on time-varying dynamics to avoid high memory usage and pre-compilation. Through a variety of comparative studies and hardware validations, we have found that MPC-RL achieves superior performance in locomotion and manipulation skills. The code base is available at https://github.com/junhengl/mpc-rl.
OCDec 5, 2016
Control Barrier Function Based Quadratic Programs for Safety Critical SystemsAaron D. Ames, Xiangru Xu, Jessy W. Grizzle et al.
Safety critical systems involve the tight coupling between potentially conflicting control objectives and safety constraints. As a means of creating a formal framework for controlling systems of this form, and with a view toward automotive applications, this paper develops a methodology that allows safety conditions -- expressed as control barrier functions -- to be unified with performance objectives -- expressed as control Lyapunov functions -- in the context of real-time optimization-based controllers. Safety conditions are specified in terms of forward invariance of a set, and are verified via two novel generalizations of barrier functions; in each case, the existence of a barrier function satisfying Lyapunov-like conditions implies forward invariance of the set, and the relationship between these two classes of barrier functions is characterized. In addition, each of these formulations yields a notion of control barrier function (CBF), providing inequality constraints in the control input that, when satisfied, again imply forward invariance of the set. Through these constructions, CBFs can naturally be unified with control Lyapunov functions (CLFs) in the context of a quadratic program (QP); this allows for the achievement of control objectives (represented by CLFs) subject to conditions on the admissible states of the system (represented by CBFs). The mediation of safety and performance through a QP is demonstrated on adaptive cruise control and lane keeping, two automotive control problems that present both safety and performance considerations coupled with actuator bounds.
SYMar 27, 2019
Control Barrier Functions: Theory and ApplicationsAaron D. Ames, Samuel Coogan, Magnus Egerstedt et al.
This paper provides an introduction and overview of recent work on control barrier functions and their use to verify and enforce safety properties in the context of (optimization based) safety-critical controllers. We survey the main technical results and discuss applications to several domains including robotic systems.
OCDec 5, 2016
Robustness of Control Barrier Functions for Safety Critical ControlXiangru Xu, Paulo Tabuada, Jessy W. Grizzle et al.
Barrier functions (also called certificates) have been an important tool for the verification of hybrid systems, and have also played important roles in optimization and multi-objective control. The extension of a barrier function to a controlled system results in a control barrier function. This can be thought of as being analogous to how Sontag extended Lyapunov functions to control Lyapunov functions in order to enable controller synthesis for stabilization tasks. A control barrier function enables controller synthesis for safety requirements specified by forward invariance of a set using a Lyapunov-like condition. This paper develops several important extensions to the notion of a control barrier function. The first involves robustness under perturbations to the vector field defining the system. Input-to-State stability conditions are given that provide for forward invariance, when disturbances are present, of a "relaxation" of set rendered invariant without disturbances. A control barrier function can be combined with a control Lyapunov function in a quadratic program to achieve a control objective subject to safety guarantees. The second result of the paper gives conditions for the control law obtained by solving the quadratic program to be Lipschitz continuous and therefore to gives rise to well-defined solutions of the resulting closed-loop system.
OCMay 5, 2017
Correctness Guarantees for the Composition of Lane Keeping and Adaptive Cruise ControlXiangru Xu, Jessy W. Grizzle, Paulo Tabuada et al.
This paper develops a control approach with correctness guarantees for the simultaneous operation of lane keeping and adaptive cruise control. The safety specifications for these driver assistance modules are expressed in terms of set invariance. Control barrier functions are used to design a family of control solutions that guarantee the forward invariance of a set, which implies satisfaction of the safety specifications. The control barrier functions are synthesized through a combination of sum-of-squares program and physics-based modeling and optimization. A real-time quadratic program is posed to combine the control barrier functions with the performance-based controllers, which can be either expressed as control Lyapunov function conditions or as black-box legacy controllers. In both cases, the resulting feedback control guarantees the safety of the composed driver assistance modules in a formally correct manner. Importantly, the quadratic program admits a closed-form solution that can be easily implemented. The effectiveness of the control approach is demonstrated by simulations in the industry-standard vehicle simulator Carsim.
AIApr 21, 2022
Sample-Based Bounds for Coherent Risk Measures: Applications to Policy Synthesis and VerificationPrithvi Akella, Anushri Dixit, Mohamadreza Ahmadi et al.
The dramatic increase of autonomous systems subject to variable environments has given rise to the pressing need to consider risk in both the synthesis and verification of policies for these systems. This paper aims to address a few problems regarding risk-aware verification and policy synthesis, by first developing a sample-based method to bound the risk measure evaluation of a random variable whose distribution is unknown. These bounds permit us to generate high-confidence verification statements for a large class of robotic systems. Second, we develop a sample-based method to determine solutions to non-convex optimization problems that outperform a large fraction of the decision space of possible solutions. Both sample-based approaches then permit us to rapidly synthesize risk-aware policies that are guaranteed to achieve a minimum level of system performance. To showcase our approach in simulation, we verify a cooperative multi-agent system and develop a risk-aware controller that outperforms the system's baseline controller. We also mention how our approach can be extended to account for any $g$-entropic risk measure - the subset of coherent risk measures on which we focus.
SYOct 24, 2018
Compositional Set Invariance in Network Systems with Assume-Guarantee ContractsYuxiao Chen, James Anderson, Karan Kalsi et al.
This paper presents an assume-guarantee reasoning approach to the computation of robust invariant sets for network systems. Parameterized signal temporal logic (pSTL) is used to formally describe the behaviors of the subsystems, which we use as the template for the contract. We show that set invariance can be proved with a valid assume-guarantee contract by reasoning about individual subsystems. If a valid assume-guarantee contract with monotonic pSTL template is known, it can be further refined by value iteration. When such a contract is not known, an epigraph method is proposed to solve for a contract that is valid, ---an approach that has linear complexity for a sparse network. A microgrid example is used to demonstrate the proposed method. The simulation result shows that together with control barrier functions, the states of all the subsystems can be bounded inside the individual robust invariant sets.
SYNov 6, 2019
Duality between density function and value function with applications in constrained optimal control and Markov Decision ProcessYuxiao Chen, Aaron D. Ames
Density function describes the density of states in the state space of a dynamic system or a Markov Decision Process (MDP). Its evolution follows the Liouville equation. We show that the density function is the dual of the value function in the optimal control problems. By utilizing the duality, constraints that are hard to enforce in the primal value function optimization such as safety constraints in robot navigation, traffic capacity constraints in traffic flow control can be posed on the density function, and the constrained optimal control problem can be solved with a primal-dual algorithm that alternates between the primal and dual optimization. The primal optimization follows the standard optimal control algorithm with a perturbation term generated by the density constraint, and the dual problem solves the Liouville equation to get the density function under a fixed control strategy and updates the perturbation term. Moreover, the proposed method can be extended to the case with exogenous disturbance, and guarantee robust safety under the worst-case disturbance. We apply the proposed method to three examples, a robot navigation problem and a traffic control problem in sim, and a segway control problem with experiment.
SYDec 12, 2022
Learning Disturbances Online for Risk-Aware Control: Risk-Aware Flight with Less Than One Minute of DataPrithvi Akella, Skylar X. Wei, Joel W. Burdick et al.
Recent advances in safety-critical risk-aware control are predicated on apriori knowledge of the disturbances a system might face. This paper proposes a method to efficiently learn these disturbances online, in a risk-aware context. First, we introduce the concept of a Surface-at-Risk, a risk measure for stochastic processes that extends Value-at-Risk -- a commonly utilized risk measure in the risk-aware controls community. Second, we model the norm of the state discrepancy between the model and the true system evolution as a scalar-valued stochastic process and determine an upper bound to its Surface-at-Risk via Gaussian Process Regression. Third, we provide theoretical results on the accuracy of our fitted surface subject to mild assumptions that are verifiable with respect to the data sets collected during system operation. Finally, we experimentally verify our procedure by augmenting a drone's controller and highlight performance increases achieved via our risk-aware approach after collecting less than a minute of operating data.
91.2OCMay 20
$π$MPC: A Parallel-in-horizon and Construction-free NMPC SolverLiang Wu, Bo Yang, Junheng Li et al.
The alternating direction method of multipliers (ADMM) has gained increasing popularity in embedded model predictive control (MPC) due to its code simplicity and pain-free parameter selection. However, existing ADMM solvers either target general quadratic programming (QP) problems or exploit sparse MPC formulations via Riccati recursions, which are inherently sequential and therefore difficult to parallelize for long prediction horizons. This technical note proposes a novel \textit{parallel-in-horizon} and \textit{construction-free} nonlinear MPC algorithm, termed $π$MPC, which combines a new variable-splitting scheme with a velocity-based system representation in the ADMM framework, enabling horizon-wise parallel execution while operating directly on system matrices without explicit MPC-to-QP construction. Numerical experiments and accompanying code are provided to validate the effectiveness of the proposed method.
SYDec 19, 2025
Distributionally Robust Imitation Learning: Layered Control Architecture for Certifiable AutonomyAditya Gahlawat, Ahmed Aboudonia, Sandeep Banik et al.
Imitation learning (IL) enables autonomous behavior by learning from expert demonstrations. While more sample-efficient than comparative alternatives like reinforcement learning, IL is sensitive to compounding errors induced by distribution shifts. There are two significant sources of distribution shifts when using IL-based feedback laws on systems: distribution shifts caused by policy error and distribution shifts due to exogenous disturbances and endogenous model errors due to lack of learning. Our previously developed approaches, Taylor Series Imitation Learning (TaSIL) and $\mathcal{L}_1$ -Distributionally Robust Adaptive Control (\ellonedrac), address the challenge of distribution shifts in complementary ways. While TaSIL offers robustness against policy error-induced distribution shifts, \ellonedrac offers robustness against distribution shifts due to aleatoric and epistemic uncertainties. To enable certifiable IL for learned and/or uncertain dynamical systems, we formulate \textit{Distributionally Robust Imitation Policy (DRIP)} architecture, a Layered Control Architecture (LCA) that integrates TaSIL and~\ellonedrac. By judiciously designing individual layer-centric input and output requirements, we show how we can guarantee certificates for the entire control pipeline. Our solution paves the path for designing fully certifiable autonomy pipelines, by integrating learning-based components, such as perception, with certifiable model-based decision-making through the proposed LCA approach.
93.3SYApr 21
Explicit Control Barrier Function-based Safety Filters and their Resource-Aware ComputationPol Mestres, Shima Sadat Mousavi, Pio Ong et al.
This paper studies the efficient implementation of safety filters that are designed using control barrier functions (CBFs), which minimally modify a nominal controller to render it safe with respect to a prescribed set of states. Although CBF-based safety filters are often implemented by solving a quadratic program (QP) in real time, the use of off-the-shelf solvers for such optimization problems poses a challenge in applications where control actions need to be computed efficiently at very high frequencies. In this paper, we introduce a closed-form expression for controllers obtained through CBF-based safety filters. This expression is obtained by partitioning the state-space into different regions, with a different closed-form solution in each region. We leverage this formula to introduce a resource-aware implementation of CBF-based safety filters that detects changes in the partition region and uses the closed-form expression between changes. We showcase the applicability of our approach in examples ranging from aerospace control to safe reinforcement learning.
59.4SYApr 21
Output Feedback Backup Control Barrier Functions: Safety Guarantees Under Input Bounds and State Estimation ErrorDavid E. J. van Wijk, Tamas G. Molnar, Samuel Coogan et al.
Guaranteeing the safety of controllers is vital for real-world applications, but is markedly difficult when the states are not perfectly known and when the control inputs are bounded. Backup control barrier functions (bCBFs) use predictions of the flow under a prescribed controller to achieve safety in the presence of bounded inputs and perfect state information. However, when only an estimate of the true state is known, this flow may not be precisely computed, as the initial condition is unknown. Furthermore, the true flow evolves using feedback from the estimated state, thus introducing coupling between known and unknown flows. To address these challenges, we propose a technique that leverages an uncertainty envelope centered around the estimated flow and show that ensuring the safety of this envelope guarantees that the true state satisfies the safety constraints. Additionally, we show that in the presence of state uncertainty, using the resulting Output Feedback Backup Control Barrier Functions (O-bCBFs), there always exists a feasible control input that can guarantee the safety of the true state, even in the presence of input constraints.
63.3ROMar 29
Safety Guardrails in the Sky: Realizing Control Barrier Functions on the VISTA F-16 JetAndrew W. Singletary, Max H. Cohen, Tamas G. Molnar et al.
The advancement of autonomous systems -- from legged robots to self-driving vehicles and aircraft -- necessitates executing increasingly high-performance and dynamic motions without ever putting the system or its environment in harm's way. In this paper, we introduce Guardrails -- a novel runtime assurance mechanism that guarantees dynamic safety for autonomous systems, allowing them to safely evolve on the edge of their operational domains. Rooted in the theory of control barrier functions, Guardrails offers a control strategy that carefully blends commands from a human or AI operator with safe control actions to guarantee safe behavior. To demonstrate its capabilities, we implemented Guardrails on an F-16 fighter jet and conducted flight tests where Guardrails supervised a human pilot to enforce g-limits, altitude bounds, geofence constraints, and combinations thereof. Throughout extensive flight testing, Guardrails successfully ensured safety, keeping the pilot in control when safe to do so and minimally modifying unsafe pilot inputs otherwise.
88.2SYApr 16
Safety Filtering with an Infinite Number of ConstraintsMax H. Cohen, Pio Ong, Pol Mestres et al.
Control barrier functions (CBFs) provide a rigorous framework for designing controllers enforcing safety constraints. While CBF theory is well-developed for a finite number of safety constraints, certain applications, e.g., backup CBFs, require an infinite number of constraints. Despite the practical success of CBFs, several fundamental questions remain unanswered when safe sets are defined with an infinite numbers of constraints, including: necessary and sufficient conditions for forward set invariance, the actual definition of CBFs associated with these sets, the regularity properties of the resulting controllers, and the ability to reduce a collection of infinite constraints to a finite number. This paper addresses these questions by extending CBF theory to the infinite constraint setting. We identify regularity conditions under which Nagumo's Theorem reduces to barrier-like inequalities and when the associated CBF controllers are at least continuous. We further connect these results to optimal-decay CBFs, bridging theoretical conditions for invariance and practical instantiations of the resulting controller. Finally, we illustrate how the developed theory addresses limitations of backup CBFs.
41.2ROApr 23
Full-Body Dynamic Safety for Robot Manipulators: 3D Poisson Safety Functions for CBF-Based Safety FiltersMeg Wilkinson, Gilbert Bahati, Ryan M. Bena et al.
Collision avoidance for robotic manipulators requires enforcing full-body safety constraints in high-dimensional configuration spaces. Control Barrier Function (CBF) based safety filters have proven effective in enabling safe behaviors, but enforcing the high number of constraints needed for safe manipulation leads to theoretic and computational challenges. This work presents a framework for full-body collision avoidance for manipulators in dynamic environments by leveraging 3D Poisson Safety Functions (PSFs). In particular, given environmental occupancy data, we sample the manipulator surface at a prescribed resolution and shrink free space via a Pontryagin difference according to this resolution. On this buffered domain, we synthesize a globally smooth CBF by solving Poisson's equation, yielding a single safety function for the entire environment. This safety function, evaluated at each sampled point, yields task-space CBF constraints enforced by a real-time safety filter via a multi-constraint quadratic program. We prove that keeping the sample points safe in the buffered region guarantees collision avoidance for the entire continuous robot surface. The framework is validated on a 7-degree-of-freedom manipulator in dynamic environments.
29.8SYApr 4
SafeSpace: Aggregating Safe Sets from Backup Control Barrier Functions under Input ConstraintsPio Ong, David E. J. van Wijk, Massimiliano de Sa et al.
Control barrier functions (CBFs) provide a principled framework for enforcing safety in control systems -- yet the certified safe operating region in practice is often conservative, especially under input bounds. In many applications, multiple smaller safe sets can be certified independently, e.g., around distinct equilibria with different stabilizing controllers. This paper proposes a framework for uniting such regions into a single certified safe set using \emph{combinatorial CBFs}. We refine the combinatorial CBF framework by introducing an auxiliary variable that enables logical compositions of individual CBFs. In the proposed framework, we show that such compositions yield a \emph{generalized combinatorial CBF} under a condition termed \emph{conjunctive compatibility}. Building on this result, we extend the framework to enable the aggregation of multiple implicit safe sets generated by the backup CBF framework. We show that the resulting CBF-based quadratic program yields a continuous safety filter over the aggregated safe region. The approach is demonstrated on two spacecraft safety problems, safe attitude control and safe station keeping, where multiple certified safe regions are combined to expand the operational envelope.
44.3ROApr 20
HALO: Hybrid Auto-encoded Locomotion with Learned Latent Dynamics, Poincaré Maps, and Regions of AttractionBlake Werner, Sergio A. Esteban, Massimiliano De Sa et al.
Reduced-order models are powerful for analyzing and controlling high-dimensional dynamical systems. Yet constructing these models for complex hybrid systems such as legged robots remains challenging. Classical approaches rely on hand-designed template models (e.g., LIP, SLIP), which, though insightful, only approximate the underlying dynamics. In contrast, data-driven methods can extract more accurate low-dimensional representations, but it remains unclear when stability and safety properties observed in the latent space meaningfully transfer back to the full-order system. To bridge this gap, we introduce HALO (Hybrid Auto-encoded Locomotion), a framework for learning latent reduced-order models of periodic hybrid dynamics directly from trajectory data. HALO employs an autoencoder to identify a low-dimensional latent state together with a learned latent Poincaré map that captures step-to-step locomotion dynamics. This enables Lyapunov analysis and the construction of an associated region of attraction in the latent space, both of which can be lifted back to the full-order state space through the decoder. Experiments on a simulated hopping robot and full-body humanoid locomotion demonstrate that HALO yields low-dimensional models that retain meaningful stability structure and predict full-order region-of-attraction boundaries.
77.6SYMar 25
Integral Control Barrier Functions with Input Delay: Prediction, Feasibility, and RobustnessAdam K. Kiss, Ersin Das, Tamas G. Molnar et al.
Time delays in feedback control loops can cause controllers to respond too late, and with excessively large corrective actions, leading to unsafe behavior (violation of state constraints) and controller infeasibility (violation of input constraints). To address this problem, we develop a safety-critical control framework for nonlinear systems with input delay using dynamically defined (integral) controllers. Building on the concept of Integral Control Barrier Functions (ICBFs), we concurrently address two fundamental challenges: compensating the effect of delays, while ensuring feasibility when state and input constraints are imposed jointly. To this end, we embed predictor feedback into a dynamically defined control law to compensate for delays, with the predicted state evolving according to delay-free dynamics. Then, utilizing ICBFs, we formulate a quadratic program for safe control design. For systems subject to simultaneous state and input constraints, we derive a closed-form feasibility condition for the resulting controller, yielding a compatible ICBF pair that guarantees forward invariance under delay. We also address robustness to prediction errors (e.g., caused by delay uncertainty) using tunable robust ICBFs. Our approach is validated on an adaptive cruise control example with actuation delay.
54.8ROMar 26
Chasing Autonomy: Dynamic Retargeting and Control Guided RL for Performant and Controllable Humanoid RunningZachary Olkin, William D. Compton, Ryan M. Bena et al.
Humanoid robots have the promise of locomoting like humans, including fast and dynamic running. Recently, reinforcement learning (RL) controllers that can mimic human motions have become popular as they can generate very dynamic behaviors, but they are often restricted to single motion play-back which hinders their deployment in long duration and autonomous locomotion. In this paper, we present a pipeline to dynamically retarget human motions through an optimization routine with hard constraints to generate improved periodic reference libraries from a single human demonstration. We then study the effect of both the reference motion and the reward structure on the reference and commanded velocity tracking, concluding that a goal-conditioned and control-guided reward which tracks dynamically optimized human data results in the best performance. We deploy the policy on hardware, demonstrating its speed and endurance by achieving running speeds of up to 3.3 m/s on a Unitree G1 robot and traversing hundreds of meters in real-world environments. Additionally, to demonstrate the controllability of the locomotion, we use the controller in a full perception and planning autonomy stack for obstacle avoidance while running outdoors.
21.1OCApr 3
High-Order Matrix Control Barrier Functions: Well-Posedness and Feasibility via Matrix Relative DegreeSamuel G. Gessow, Pio Ong, Aaron D. Ames et al.
Control barrier functions (CBFs) provide an effective framework for enforcing safety in dynamical systems with scalar constraints. However, many safety constraints are more naturally expressed as matrix-valued conditions, such as positive definiteness or eigenvalue bounds - scalar formulations introduce potential nonsmoothness that complicates analysis. Matrix control barrier functions (MCBFs) address this limitation by directly enforcing matrix-valued safety constraints. Yet for constraints where the control input does not appear in the first derivative, high-order formulations are required. While such extensions are well understood in the scalar case, they remain largely unexplored in the matrix case. This paper develops high-order matrix control barrier functions (HOMCBFs) and establishes conditions ensuring well-posedness and feasibility of the associated constraints, enabling enforcement of matrix-valued safety constraints for systems with high-order dynamics. We further show that, using an optimal-decay HOMCBF formulation, forward invariance can be ensured while requiring control only over the minimum eigenspace. The framework is demonstrated on a localization safety problem by enforcing positive definiteness of the information matrix for a double integrator system with a nonlinear measurement model.
80.4SYApr 3
Steering with Contingencies: Combinatorial Stabilization and Reach-Avoid FiltersYana Lishkova, Pio Ong, Sander Tonkens et al.
In applications such as autonomous landing and navigation, it is often desirable to steer toward a target while retaining the ability to divert to at least $r$ (out of $p$) alternative sites if conditions change. In this work, we formalize this combinatorial contingency requirement and develop tractable control filters for enforcement. Combinatorial stabilization requires asymptotic stability of a selected equilibrium while ensuring the trajectory remains within the safe region of attraction of at least $r$-out-of-$p$ candidates. To enforce this requirement, we use control Lyapunov functions (CLFs) to construct regions of attraction, which are combined combinatorially within an optimization-based filter. Combinatorial targeting extends this framework to finite-horizon problems using Hamilton-Jacobi backward reach-avoid sets, accommodating shrinking reachable regions due to finite horizons or resource depletion. In both formulations, the resulting combinatorial stability filter and combinatorial reach-avoid filter require only $p+1$ constraints, preventing combinatorial blow-up and enabling safe real-time switching between targets. The framework is demonstrated on two examples where the filters ensure steering with contingency and enable safe diversion.
79.4SYMar 27
A Duality-Based Optimization Formulation of Safe Control Design with State UncertaintiesXiao Tan, Rahal Nanayakkara, Paulo Tabuada et al.
State estimation uncertainty is prevalent in real-world applications, hindering the application of safety-critical control. Existing methods address this by strengthening a Control Barrier Function (CBF) condition either to handle actuation errors induced by state uncertainty, or to enforce stricter, more conservative sufficient conditions. In this work, we take a more direct approach and formulate a robust safety filter by analyzing the image of the set of all possible states under the CBF dynamics. We first prove that convexifying this image set does not change the set of possible inputs. Then, by leveraging duality, we propose an equivalent and tractable reformulation for cases where this convex hull can be expressed as a polytope or ellipsoid. Simulation results show the approach in this paper to be less conservative than existing alternatives.
46.0SYMar 19
Generalizations of Backup Control Barrier Functions: Expansion and Adaptation for Input-Bounded Safety-Critical ControlDavid E. J. van Wijk, Dohyun Lee, Ersin Das et al.
Guaranteeing the safety of nonlinear systems with bounded inputs remains a key challenge in safe autonomy. Backup control barrier functions (bCBFs) provide a powerful mechanism for constructing controlled invariant sets by propagating trajectories under a pre-verified backup controller to a forward invariant backup set. While effective, the standard bCBF method utilizes the same backup controller for both set expansion and safety certification, which can restrict the expanded safe set and lead to conservative dynamic behavior. In this study, we generalize the bCBF framework by separating the set-expanding controller from the verified backup controller, thereby enabling a broader class of expansion strategies while preserving formal safety guarantees. We establish sufficient conditions for forward invariance of the resulting implicit safe set and show how the generalized construction recovers existing bCBF methods as special cases. Moreover, we extend the proposed framework to parameterized controller families, enabling online adaptation of the expansion controller while maintaining safety guarantees in the presence of input bounds.
45.8SYMar 19
Topological Obstructions to the Existence of Control Barrier FunctionsMassimiliano de Sa, Aaron D. Ames
In 1983, Brockett developed a topological necessary condition for the existence of continuous, asymptotically stabilizing control laws. Building upon recent work on necessary conditions for set stabilization, we develop Brockett-like necessary conditions for the existence of control barrier functions (CBFs). By leveraging the unique geometry of CBF safe sets, we provide simple and self-contained derivations of necessary conditions for the existence of CBFs and their safe, continuous controllers. We demonstrate the application of these conditions to instructive examples and kinematic nonholonomic systems, and discuss their relationship to Brockett's necessary condition.
58.6ROMay 15
Terrain Consistent Reference-Guided RL for Humanoid Navigation AutonomyWilliam D. Compton, Zachary Olkin, Aaron D. Ames
We present a method for training reference-guided, perceptive reinforcement learning locomotion policies for humanoid robots in which reference trajectories are modulated in training to be consistent with terrain geometry. Aiming to deploy our method with standard navigation autonomy infrastructure, we synthesize SE(2)-controllable reference trajectories inside the RL training loop, projecting desired footsteps onto valid footholds and adjusting swing-foot and center-of-mass trajectories to match the terrain. The resulting policy exposes a clean SE(2) velocity interface compatible with standard navigation planners. In simulation, environmentally-conditioned references significantly improve reference tracking performance compared to environment agnostic references. On hardware, we integrate the policy with an MPC + control barrier function planner and demonstrate long-horizon (>70m) closed-loop autonomous navigation on the Unitree G1 through outdoor environments containing rough terrain and consecutive flights of stairs, with all sensing and computation onboard.
39.5ROMar 25
MIRROR: Visual Motion Imitation via Real-time Retargeting and Teleoperation with Parallel Differential Inverse KinematicsJunheng Li, Lizhi Yang, Aaron D. Ames
Real-time humanoid teleoperation requires inverse kinematics (IK) solvers that are both responsive and constraint-safe under kinematic redundancy and self-collision constraints. While differential IK enables efficient online retargeting, its locally linearized updates are inherently basin-dependent and often become trapped near joint limits, singularities, or active collision boundaries, leading to unsafe or stagnant behavior. We propose a GPU-parallelized, continuation-based differential IK that improves escape from such constraint-induced local minima while preserving real-time performance, promoting safety and stability. Multiple constrained IK quadratic programs are evaluated in parallel, together with a self-collision avoidance control barrier function (CBF), and a Lyapunov-based progression criterion selects updates that reduce the final global task-space error. The method is paired with a visual skeletal pose estimation pipeline that enables robust, real-time upper-body teleoperation on the THEMIS humanoid robot hardware in real-world tasks.
63.7SYApr 10
Probabilistic Control Barrier Functions for Systems with State Estimation Uncertainty using Sub-Gaussian ConcentrationKazuya Echigo, David E. J. van Wijk, Pol Mestres et al.
Safety-critical control systems, such as spacecraft performing proximity operations, must provide formal safety guarantees despite stochastic uncertainties from state estimation and unmodeled dynamics. Although Control Barrier Functions (CBFs) have been extended to stochastic systems, existing approaches typically face a trade-off between the tightness of probabilistic guarantees and computational tractability. This paper presents a particle-based probabilistic CBF framework that overcomes this limitation by exploiting the sub-Gaussian structure of the barrier function increment under Gaussian uncertainties. We establish that Gaussian uncertainties propagating through Lipschitz-continuous control-affine dynamics preserve sub-Gaussianity of the barrier function increment, with explicit tail bounds. Leveraging this structure, we derive finite-sample bounds on the approximation error between particle-based Conditional Value at Risk (CVaR) estimates and ground-truth probabilistic constraints; applying this yields a tractable optimization problem formulation with finite-sample safety certificates. We show through numerical experiments how the proposed approach provides tight yet provably valid probabilistic safety guarantees.
LGFeb 5, 2022Code
LyaNet: A Lyapunov Framework for Training Neural ODEsIvan Dario Jimenez Rodriguez, Aaron D. Ames, Yisong Yue
We propose a method for training ordinary differential equations by using a control-theoretic Lyapunov condition for stability. Our approach, called LyaNet, is based on a novel Lyapunov loss formulation that encourages the inference dynamics to converge quickly to the correct prediction. Theoretically, we show that minimizing Lyapunov loss guarantees exponential convergence to the correct solution and enables a novel robustness guarantee. We also provide practical algorithms, including one that avoids the cost of backpropagating through a solver or using the adjoint method. Relative to standard Neural ODE training, we empirically find that LyaNet can offer improved prediction performance, faster convergence of inference dynamics, and improved adversarial robustness. Our code available at https://github.com/ivandariojr/LyapunovLearning .
ROMar 8, 2021Code
Learning to Control an Unstable System with One Minute of Data: Leveraging Gaussian Process Differentiation in Predictive ControlIvan D. Jimenez Rodriguez, Ugo Rosolia, Aaron D. Ames et al.
We present a straightforward and efficient way to control unstable robotic systems using an estimated dynamics model. Specifically, we show how to exploit the differentiability of Gaussian Processes to create a state-dependent linearized approximation of the true continuous dynamics that can be integrated with model predictive control. Our approach is compatible with most Gaussian process approaches for system identification, and can learn an accurate model using modest amounts of training data. We validate our approach by learning the dynamics of an unstable system such as a segway with a 7-D state space and 2-D input space (using only one minute of data), and we show that the resulting controller is robust to unmodelled dynamics and disturbances, while state-of-the-art control methods based on nominal models can fail under small perturbations. Code is open sourced at https://github.com/learning-and-control/core .
90.7CTApr 6
Hybrid Systems as Coalgebras: Lyapunov Morphisms for Zeno StabilityJoe Moeller, Aaron D. Ames
Hybrid dynamical systems exhibit a diverse array of stability phenomena, each currently addressed by separate Lyapunov-like results. We show that these results are all instances of a single theorem: a Lyapunov function is a morphism from a hybrid system into a simple stable target system $Ï$, and different stability notions such as Lyapunov stability, asymptotic stability, exponential stability, and Zeno stability correspond to different choices of $Ï$. This unification is achieved by expressing hybrid systems as coalgebras of an endofunctor $\mathcal H$ on a category $\mathsf{Chart}$ that naturally blends continuous and discrete dynamics. Instantiating a general categorical Lyapunov theorem for coalgebras to this setting results in new Lypaunov-like conditions for the stability of Zeno equilibria and the existence of Zeno behavior in hybrid systems.
10.5ROMay 5
On Surprising Effects of Risk-Aware Domain Randomization for Contact-Rich Sampling-based Predictive ControlSergio A. Esteban, Junheng Li, Vince Kurtz et al.
Domain randomization (DR) is widely used in policy learning to improve robustness to modeling error, but remains underexplored in contact-rich sampling-based predictive control (SPC), where rollout quality is highly sensitive to uncertainty. In this work, we take the first step by studying risk-aware DR in predictive sampling on a simple yet representative Push-T task, comparing average, optimistic, and pessimistic rollout aggregations under randomized model instances. Our initial results suggest that DR affects not only robustness to model error, but also the effective cost landscape seen by the sampling-based optimizer, by reshaping the basin of attraction around contact-producing actions. This opens up potential for exploring better grounded risk-aware contact-rich SPC under model uncertainty. Video: https://youtu.be/f1F0ALXxhSM
92.7SYMay 3
Stability of Control Lyapunov Function Guided Reinforcement LearningZachary Olkin, William D. Compton, Aaron D. Ames
Reinforcement learning (RL) has become the de facto method for achieving locomotion on humanoid robots in practice, yet stability analysis of the corresponding control policies is lacking. Recent work has attempted to merge control theoretic ideas with reinforcement learning through control guided learning. A notable example of this is the use of a control Lyapunov function (CLF) to synthesize the reinforcement learning rewards, a technique known as CLF-RL, which has shown practical success. This paper investigates the stability properties of optimal controllers using CLF-RL with the goal of bridging experimentally observed stability with theoretical guarantees. The RL problem is viewed as an optimal control problem and exponential stability is proven in both continuous and discrete time using both core CLF reward terms and the additional terms used in practice. The theoretical bounds are numerically verified on systems such as the double integrator and cart-pole. Finally, the CLF guided rewards are implemented for a walking humanoid robot to generate stable periodic orbits.
SYDec 5, 2024
Learning for Layered Safety-Critical Control with Predictive Control Barrier FunctionsWilliam D. Compton, Max H. Cohen, Aaron D. Ames
Safety filters leveraging control barrier functions (CBFs) are highly effective for enforcing safe behavior on complex systems. It is often easier to synthesize CBFs for a Reduced order Model (RoM), and track the resulting safe behavior on the Full order Model (FoM) -- yet gaps between the RoM and FoM can result in safety violations. This paper introduces \emph{predictive CBFs} to address this gap by leveraging rollouts of the FoM to define a predictive robustness term added to the RoM CBF condition. Theoretically, we prove that this guarantees safety in a layered control implementation. Practically, we learn the predictive robustness term through massive parallel simulation with domain randomization. We demonstrate in simulation that this yields safe FoM behavior with minimal conservatism, and experimentally realize predictive CBFs on a 3D hopping robot.
89.4OCApr 6
Collaborative Altruistic Safety in Coupled Multi-Agent SystemsBrooks A. Butler, Xiao Tan, Aaron D. Ames et al.
This paper presents a novel framework for ensuring safety in dynamically coupled multi-agent systems through collaborative control. Drawing inspiration from ecological models of altruism, we develop collaborative control barrier functions that allow agents to cooperatively enforce individual safety constraints under coupling dynamics. We introduce an altruistic safety condition based on the so-called Hamilton's rule, enabling agents to trade off their own safety to support higher-priority neighbors. By incorporating these conditions into a distributed optimization framework, we demonstrate increased feasibility and robustness in maintaining system-wide safety. The effectiveness of the proposed approach is illustrated through simulation in a simplified formation control scenario.
58.9SYApr 5
Stability Margins of CBF-QP Safety Filters: Analysis and SynthesisShima Sadat Mousavi, Pol Mestres, Aaron D. Ames
Control barrier function (CBF)-QP safety filters enforce safety by minimally modifying a nominal controller. While prior work has mainly addressed robustness of safety under uncertainty, robustness of the resulting closed-loop \emph{stability} is much less understood. This issue is important because once the safety filter becomes active, it modifies the nominal dynamics and can reduce stability margins or even destabilize the system, despite preserving safety. For linear systems with a single affine safety constraint, we show that the active-mode dynamics admit an exact scalar loop representation, leading to a classical robust-control interpretation in terms of gain, phase, and delay margins. This viewpoint yields exact stability-margin characterizations and tractable linear matrix inequality (LMI)-based certificates and synthesis conditions for controllers with certified robustness guarantees. Numerical examples illustrate the proposed analysis and the enlargement of certified stability margins for safety-filtered systems.
80.1SYApr 5
Structure, Feasibility, and Explicit Safety Filters for Linear SystemsShima Sadat Mousavi, Max H. Cohen, Pol Mestres et al.
Safety filters based on control barrier functions (CBFs) and high-order control barrier functions (HOCBFs) are often implemented through quadratic programs (QPs). In general, especially in the presence of multiple constraints, feasibility is difficult to certify before solving the QP and may be lost as the state evolves. This paper addresses this issue for linear time-invariant (LTI) systems with affine safety constraints. Exploiting the resulting geometry of the constraint normals, and considering both unbounded and bounded inputs, we characterize feasibility for several structured classes of constraints. For certain such cases, we also derive closed-form safety filters. These explicit filters avoid online optimization and provide a simple alternative to QP-based implementations. Numerical examples illustrate the results.
ROOct 16, 2025
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier FunctionsLizhi Yang, Blake Werner, Massimiliano de Sa et al.
Reinforcement learning (RL), while powerful and expressive, can often prioritize performance at the expense of safety. Yet safety violations can lead to catastrophic outcomes in real-world deployments. Control Barrier Functions (CBFs) offer a principled method to enforce dynamic safety -- traditionally deployed online via safety filters. While the result is safe behavior, the fact that the RL policy does not have knowledge of the CBF can lead to conservative behaviors. This paper proposes CBF-RL, a framework for generating safe behaviors with RL by enforcing CBFs in training. CBF-RL has two key attributes: (1) minimally modifying a nominal RL policy to encode safety constraints via a CBF term, (2) and safety filtering of the policy rollouts in training. Theoretically, we prove that continuous-time safety filters can be deployed via closed-form expressions on discrete-time roll-outs. Practically, we demonstrate that CBF-RL internalizes the safety constraints in the learned policy -- both enforcing safer actions and biasing towards safer rewards -- enabling safe deployment without the need for an online safety filter. We validate our framework through ablation studies on navigation tasks and on the Unitree G1 humanoid robot, where CBF-RL enables safer exploration, faster convergence, and robust performance under uncertainty, enabling the humanoid robot to avoid obstacles and climb stairs safely in real-world settings without a runtime safety filter.
ROOct 16, 2025
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid LocomotionBlake Werner, Lizhi Yang, Aaron D. Ames
Robust humanoid locomotion in unstructured environments requires architectures that balance fast low-level stabilization with slower perceptual decision-making. We show that a simple layered control architecture (LCA), a proprioceptive stabilizer running at high rate, coupled with a compact low-rate perceptual policy, enables substantially more robust performance than monolithic end-to-end designs, even when using minimal perception encoders. Through a two-stage training curriculum (blind stabilizer pretraining followed by perceptual fine-tuning), we demonstrate that layered policies consistently outperform one-stage alternatives in both simulation and hardware. On a Unitree G1 humanoid, our approach succeeds across stair and ledge tasks where one-stage perceptual policies fail. These results highlight that architectural separation of timescales, rather than network scale or complexity, is the key enabler for robust perception-conditioned locomotion.
RONov 22, 2024
Dynamic Tube MPC: Learning Tube Dynamics with Massively Parallel Simulation for Robust Safety in PracticeWilliam D. Compton, Noel Csomay-Shanklin, Cole Johnson et al.
Safe navigation of cluttered environments is a critical challenge in robotics. It is typically approached by separating the planning and tracking problems, with planning executed on a reduced order model to generate reference trajectories, and control techniques used to track these trajectories on the full order dynamics. Inevitable tracking error necessitates robustification of the nominal plan to ensure safety; in many cases, this is accomplished via worst-case bounding, which ignores the fact that some trajectories of the planning model may be easier to track than others. In this work, we present a novel method leveraging massively parallel simulation to learn a dynamic tube representation, which characterizes tracking performance as a function of actions taken by the planning model. Planning model trajectories are then optimized such that the dynamic tube lies in the free space, allowing a balance between performance and safety to be traded off in real time. The resulting Dynamic Tube MPC is applied to the 3D hopping robot ARCHER, enabling agile and performant navigation of cluttered environments, and safe collision-free traversal of narrow corridors.
LGJun 20, 2024
Preferential Multi-Objective Bayesian OptimizationRaul Astudillo, Kejun Li, Maegan Tucker et al.
Preferential Bayesian optimization (PBO) is a framework for optimizing a decision-maker's latent preferences over available design choices. While preferences often involve multiple conflicting objectives, existing work in PBO assumes that preferences can be encoded by a single objective function. For example, in robotic assistive devices, technicians often attempt to maximize user comfort while simultaneously minimizing mechanical energy consumption for longer battery life. Similarly, in autonomous driving policy design, decision-makers wish to understand the trade-offs between multiple safety and performance attributes before committing to a policy. To address this gap, we propose the first framework for PBO with multiple objectives. Within this framework, we present dueling scalarized Thompson sampling (DSTS), a multi-objective generalization of the popular dueling Thompson algorithm, which may be of interest beyond the PBO setting. We evaluate DSTS across four synthetic test functions and two simulated exoskeleton personalization and driving policy design tasks, showing that it outperforms several benchmarks. Finally, we prove that DSTS is asymptotically consistent. As a direct consequence, this result provides, to our knowledge, the first convergence guarantee for dueling Thompson sampling in the PBO setting.
SYJan 12, 2022
Onboard Safety Guarantees for Racing Drones: High-speed Geofencing with Control Barrier FunctionsAndrew Singletary, Aiden Swann, Yuxiao Chen et al.
This paper details the theory and implementation behind practically ensuring safety of remotely piloted racing drones. We demonstrate robust and practical safety guarantees on a 7" racing drone at speeds exceeding 100 km/h, utilizing only online computations on a 10 gram micro-controller. To achieve this goal, we utilize the framework of control barrier functions (CBFs) which give guaranteed safety encoded as forward set invariance. To make this methodology practically applicable, we present an implicitly defined CBF which leverages backup controllers to enable gradient-free evaluations that ensure safety. The method applied to hardware results in smooth, minimally conservative alterations of the pilots' desired inputs, enabling them to push the limits of their drone without fear of crashing. Moreover, the method works in conjunction with the preexisting flight controller, resulting in unaltered flight when there are no nearby safety risks. Additional benefits include safety and stability of the drone when losing line-of-sight or in the event of radio failure.
SYJan 4, 2022
Test and Evaluation of Quadrupedal Walking Gaits through Sim2Real Gap QuantificationPrithvi Akella, Wyatt Ubellacker, Aaron D. Ames
In this letter, the authors propose a two-step approach to evaluate and verify a true system's capacity to satisfy its operational objective. Specifically, whenever the system objective has a quantifiable measure of satisfaction, i.e. a signal temporal logic specification, a barrier function, etc - the authors develop two separate optimization problems solvable via a Bayesian Optimization procedure detailed within. This dual approach has the added benefit of quantifying the Sim2Real Gap between a system simulator and its hardware counterpart. Our contributions are twofold. First, we show repeatability with respect to our outlined optimization procedure in solving these optimization problems. Second, we show that the same procedure can discriminate between different environments by identifying the Sim2Real Gap between a simulator and its hardware counterpart operating in different environments.
RODec 15, 2021
Safety-Aware Preference-Based Learning for Safety-Critical ControlRyan K. Cosner, Maegan Tucker, Andrew J. Taylor et al.
Bringing dynamic robots into the wild requires a tenuous balance between performance and safety. Yet controllers designed to provide robust safety guarantees often result in conservative behavior, and tuning these controllers to find the ideal trade-off between performance and safety typically requires domain expertise or a carefully constructed reward function. This work presents a design paradigm for systematically achieving behaviors that balance performance and robust safety by integrating safety-aware Preference-Based Learning (PBL) with Control Barrier Functions (CBFs). Fusing these concepts -- safety-aware learning and safety-critical control -- gives a robust means to achieve safe behaviors on complex robotic systems in practice. We demonstrate the capability of this design paradigm to achieve safe and performant perception-based autonomous operation of a quadrupedal robot both in simulation and experimentally on hardware.
SYDec 15, 2021
Safety-Critical Control with Input Delay in Dynamic EnvironmentTamas G. Molnar, Adam K. Kiss, Aaron D. Ames et al.
Endowing nonlinear systems with safe behavior is increasingly important in modern control. This task is particularly challenging for real-life control systems that must operate safely in dynamically changing environments. This paper develops a framework for safety-critical control in dynamic environments, by establishing the notion of environmental control barrier functions (ECBFs). The framework is able to guarantee safety even in the presence of input delay, by accounting for the evolution of the environment during the delayed response of the system. The underlying control synthesis relies on predicting the future state of the system and the environment over the delay interval, with robust safety guarantees against prediction errors. The efficacy of the proposed method is demonstrated by a simple adaptive cruise control problem and a more complex robotics application on a Segway platform.
ROOct 3, 2021
Mixed Observable RRT: Multi-Agent Mission-Planning in Partially Observable EnvironmentsKasper Johansson, Ugo Rosolia, Wyatt Ubellacker et al.
This paper considers centralized mission-planning for a heterogeneous multi-agent system with the aim of locating a hidden target. We propose a mixed observable setting, consisting of a fully observable state-space and a partially observable environment, using a hidden Markov model. First, we construct rapidly exploring random trees (RRTs) to introduce the mixed observable RRT for finding plausible mission plans giving way-points for each agent. Leveraging this construction, we present a path-selection strategy based on a dynamic programming approach, which accounts for the uncertainty from partial observations and minimizes the expected cost. Finally, we combine the high-level plan with model predictive control algorithms to evaluate the approach on an experimental setup consisting of a quadruped robot and a drone. It is shown that agents are able to make intelligent decisions to explore the area efficiently and to locate the target through collaborative actions.
ROSep 19, 2021
Model-Free Safety-Critical Control for Robotic SystemsTamas G. Molnar, Ryan K. Cosner, Andrew W. Singletary et al.
This paper presents a framework for the safety-critical control of robotic systems, when safety is defined on safe regions in the configuration space. To maintain safety, we synthesize a safe velocity based on control barrier function theory without relying on a -- potentially complicated -- high-fidelity dynamical model of the robot. Then, we track the safe velocity with a tracking controller. This culminates in model-free safety critical control. We prove theoretical safety guarantees for the proposed method. Finally, we demonstrate that this approach is application-agnostic. We execute an obstacle avoidance task with a Segway in high-fidelity simulation, as well as with a Drone and a Quadruped in hardware experiments.
SYSep 10, 2021
Interactive multi-modal motion planning with Branch Model Predictive ControlYuxiao Chen, Ugo Rosolia, Wyatt Ubellacker et al.
Motion planning for autonomous robots and vehicles in presence of uncontrolled agents remains a challenging problem as the reactive behaviors of the uncontrolled agents must be considered. Since the uncontrolled agents usually demonstrate multimodal reactive behavior, the motion planner needs to solve a continuous motion planning problem under these behaviors, which contains a discrete element. We propose a branch Model Predictive Control (MPC) framework that plans over feedback policies to leverage the reactive behavior of the uncontrolled agent. In particular, a scenario tree is constructed from a finite set of policies of the uncontrolled agent, and the branch MPC solves for a feedback policy in the form of a trajectory tree, which shares the same topology as the scenario tree. Moreover, coherent risk measures such as the Conditional Value at Risk (CVaR) are used as a tuning knob to adjust the tradeoff between performance and robustness. The proposed branch MPC framework is tested on an overtake and lane change task and a merging task for autonomous vehicles in simulation, and on the motion planning of an autonomous quadruped robot alongside an uncontrolled quadruped in experiments. The result demonstrates interesting human-like behaviors, achieving a balance between safety and performance.
ROSep 10, 2021
Natural Multicontact Walking for Robotic Assistive Devices via Musculoskeletal Models and Hybrid Zero DynamicsKejun Li, Maegan Tucker, Rachel Gehlhar et al.
Generating stable walking gaits that yield natural locomotion when executed on robotic-assistive devices is a challenging task that often requires hand-tuning by domain experts. This paper presents an alternative methodology, where we propose the addition of musculoskeletal models directly into the gait generation process to intuitively shape the resulting behavior. In particular, we construct a multi-domain hybrid system model that combines the system dynamics with muscle models to represent natural multicontact walking. Provably stable walking gaits can then be generated for this model via the hybrid zero dynamics (HZD) method. We experimentally apply our integrated framework towards achieving multicontact locomotion on a dual-actuated transfemoral prosthesis, AMPRO3, for two subjects. The results demonstrate that enforcing muscle model constraints produces gaits that yield natural locomotion (as analyzed via comparison to motion capture data and electromyography). Moreover, gaits generated with our framework were strongly preferred by the non-disabled prosthetic users as compared to gaits generated with the nominal HZD method, even with the use of systematic tuning methods. We conclude that the novel approach of combining robotic walking methods (specifically HZD) with muscle models successfully generates anthropomorphic robotic-assisted locomotion.
AISep 9, 2021
Risk-Averse Decision Making Under UncertaintyMohamadreza Ahmadi, Ugo Rosolia, Michel D. Ingham et al.
A large class of decision making under uncertainty problems can be described via Markov decision processes (MDPs) or partially observable MDPs (POMDPs), with application to artificial intelligence and operations research, among others. Traditionally, policy synthesis techniques are proposed such that a total expected cost or reward is minimized or maximized. However, optimality in the total expected cost sense is only reasonable if system behavior in the large number of runs is of interest, which has limited the use of such policies in practical mission-critical scenarios, wherein large deviations from the expected behavior may lead to mission failure. In this paper, we consider the problem of designing policies for MDPs and POMDPs with objectives and constraints in terms of dynamic coherent risk measures, which we refer to as the constrained risk-averse problem. For MDPs, we reformulate the problem into a infsup problem via the Lagrangian framework and propose an optimization-based method to synthesize Markovian policies. For MDPs, we demonstrate that the formulated optimization problems are in the form of difference convex programs (DCPs) and can be solved by the disciplined convex-concave programming (DCCP) framework. We show that these results generalize linear programs for constrained MDPs with total discounted expected costs and constraints. For POMDPs, we show that, if the coherent risk measures can be defined as a Markov risk transition mapping, an infinite-dimensional optimization can be used to design Markovian belief-based policies. For stochastic finite-state controllers (FSCs), we show that the latter optimization simplifies to a (finite-dimensional) DCP and can be solved by the DCCP framework. We incorporate these DCPs in a policy iteration algorithm to design risk-averse FSCs for POMDPs.