44.5ROApr 27Code
Betting for Sim-to-Real Performance EvaluationZaid Mahboob, Yujia Chen, Bowen Weng
This paper studies the problem of robot performance evaluation, focusing on how to obtain accurate and efficient estimates of real-world behavior under severe constraints on physical experimentation. Such estimates are essential for benchmarking algorithms, comparing design alternatives, validating controllers, and supporting certification or regulatory decision-making, yet real-world testing with physical robots is often expensive, time-consuming, and safety-limited. To mitigate the scarcity of real-world trials, sim-to-real methodologies are commonly employed, using low-cost simulators to inform, supplement, or prioritize physical experiments. Departing from (and complementary to) existing approaches in variance reduction (e.g., importance-sampling variants) or bias-correction (e.g., through prediction-powered inference or learned control variates), we examine this performance-evaluation problem through the lens of betting. We establish theoretical conditions under which a betting mechanism can yield accurate and efficient estimates (provably outperforming the Monte Carlo estimator) and we characterize how such bets should be constructed. We further develop theoretically grounded yet practically implementable approximations of the ideal bet, and we provide concrete decision rules that diagnose when these approximate betting strategies are working as intended. We demonstrate the effectiveness of the proposed methods using both synthetic examples and cross-fidelity computational simulators. Notably, we also showcase an illustrative case in which a group of synthetic distributions are used to infer the real-world pick-and-place accuracy of a robotic manipulator, a seemingly unconventional sim-to-real transfer that becomes natural and feasible under the proposed betting perspective. Programs for reproducing empirical results are available at https://github.com/ISUSAIL/Bet4Sim2Real.
MAJan 29
Aligning Microscopic Vehicle and Macroscopic Traffic Statistics: Reconstructing Driving Behavior from Partial DataZhihao Zhang, Keith Redmill, Chengyang Peng et al.
A driving algorithm that aligns with good human driving practices, or at the very least collaborates effectively with human drivers, is crucial for developing safe and efficient autonomous vehicles. In practice, two main approaches are commonly adopted: (i) supervised or imitation learning, which requires comprehensive naturalistic driving data capturing all states that influence a vehicle's decisions and corresponding actions, and (ii) reinforcement learning (RL), where the simulated driving environment either matches or is intentionally more challenging than real-world conditions. Both methods depend on high-quality observations of real-world driving behavior, which are often difficult and costly to obtain. State-of-the-art sensors on individual vehicles can gather microscopic data, but they lack context about the surrounding conditions. Conversely, roadside sensors can capture traffic flow and other macroscopic characteristics, but they cannot associate this information with individual vehicles on a microscopic level. Motivated by this complementarity, we propose a framework that reconstructs unobserved microscopic states from macroscopic observations, using microscopic data to anchor observed vehicle behaviors, and learns a shared policy whose behavior is microscopically consistent with the partially observed trajectories and actions and macroscopically aligned with target traffic statistics when deployed population-wide. Such constrained and regularized policies promote realistic flow patterns and safe coordination with human drivers at scale.
ROApr 21, 2025
Post-Convergence Sim-to-Real Policy Transfer: A Principled Alternative to Cherry-PickingDylan Khor, Bowen Weng
Learning-based approaches, particularly reinforcement learning (RL), have become widely used for developing control policies for autonomous agents, such as locomotion policies for legged robots. RL training typically maximizes a predefined reward (or minimizes a corresponding cost/loss) by iteratively optimizing policies within a simulator. Starting from a randomly initialized policy, the empirical expected reward follows a trajectory with an overall increasing trend. While some policies become temporarily stuck in local optima, a well-defined training process generally converges to a reward level with noisy oscillations. However, selecting a policy for real-world deployment is rarely an analytical decision (i.e., simply choosing the one with the highest reward) and is instead often performed through trial and error. To improve sim-to-real transfer, most research focuses on the pre-convergence stage, employing techniques such as domain randomization, multi-fidelity training, adversarial training, and architectural innovations. However, these methods do not eliminate the inevitable convergence trajectory and noisy oscillations of rewards, leading to heuristic policy selection or cherry-picking. This paper addresses the post-convergence sim-to-real transfer problem by introducing a worst-case performance transference optimization approach, formulated as a convex quadratic-constrained linear programming problem. Extensive experiments demonstrate its effectiveness in transferring RL-based locomotion policies from simulation to real-world laboratory tests.
ROFeb 17, 2022
A Formal Safety Characterization of Advanced Driver Assist Systems in the Car-Following Regime with Scenario-SamplingBowen Weng, Minghao Zhu, Keith Redmill
The capability to follow a lead-vehicle and avoid rear-end collisions is one of the most important functionalities for human drivers and various Advanced Driver Assist Systems (ADAS). Existing safety performance justification of the car-following systems either relies on simple concrete scenarios with biased surrogate metrics or requires a significantly long driving distance for risk observation and inference. In this paper, we propose a guaranteed unbiased and sampling efficient scenario-based safety evaluation framework inspired by the previous work on $εδ$-almost safe set quantification. The proposal characterizes the complete safety performance of the test subject in the car-following regime. The performance of the proposed method is also demonstrated in challenging cases including some widely adopted car-following decision-making modules and the commercially available Openpilot driving stack by CommaAI.
RONov 15, 2021
A Finite-Sampling, Operational Domain Specific, and Provably Unbiased Connected and Automated Vehicle Safety MetricBowen Weng, Linda Capito, Umit Ozguner et al.
A connected and automated vehicle safety metric determines the performance of a subject vehicle (SV) by analyzing the data involving the interactions among the SV and other dynamic road users and environmental features. When the data set contains only a finite set of samples collected from the naturalistic mixed-traffic driving environment, a metric is expected to generalize the safety assessment outcome from the observed finite samples to the unobserved cases by specifying in what domain the SV is expected to be safe and how safe the SV is, statistically, in that domain. However, to the best of our knowledge, none of the existing safety metrics are able to justify the above properties with an operational domain specific, guaranteed complete, and provably unbiased safety evaluation outcome. In this paper, we propose a novel safety metric that involves the $α$-shape and the $ε$-almost robustly forward invariant set to characterize the SV's almost safe operable domain and the probability for the SV to remain inside the safe domain indefinitely, respectively. The empirical performance of the proposed method is demonstrated in several different operational design domains through a series of cases covering a variety of fidelity levels (real-world and simulators), driving environments (highway, urban, and intersections), road users (car, truck, and pedestrian), and SV driving behaviors (human driver and self driving algorithms).
ROOct 5, 2021
A Formal Characterization of Black-Box System Safety Performance with Scenario SamplingBowen Weng, Linda Capito, Umit Ozguner et al.
A typical scenario-based evaluation framework seeks to characterize a black-box system's safety performance (e.g., failure rate) through repeatedly sampling initialization configurations (scenario sampling) and executing a certain test policy for scenario propagation (scenario testing) with the black-box system involved as the test subject. In this letter, we first present a novel safety evaluation criterion that seeks to characterize the actual operational domain within which the test subject would remain safe indefinitely with high probability. By formulating the black-box testing scenario as a dynamic system, we show that the presented problem is equivalent to finding a certain "almost" robustly forward invariant set for the given system. Second, for an arbitrary scenario testing strategy, we propose a scenario sampling algorithm that is provably asymptotically optimal in obtaining the safe invariant set with arbitrarily high accuracy. Moreover, as one considers different testing strategies (e.g., biased sampling of safety-critical cases), we show that the proposed algorithm still converges to the unbiased approximation of the safety characterization outcome if the scenario testing satisfies a certain condition. Finally, the effectiveness of the presented scenario sampling algorithms and various theoretical properties are demonstrated in a case study of the safety evaluation of a control barrier function-based mobile robot collision avoidance system.
ROApr 19, 2021
Towards Guaranteed Safety Assurance of Automated Driving Systems with Scenario Sampling: An Invariant Set Perspective (Extended Version)Bowen Weng, Linda Capito, Umit Ozguner et al.
How many scenarios are sufficient to validate the safe Operational Design Domain (ODD) of an Automated Driving System (ADS) equipped vehicle? Is a more significant number of sampled scenarios guaranteeing a more accurate safety assessment of the ADS? Despite the various empirical success of ADS safety evaluation with scenario sampling in practice, some of the fundamental properties are largely unknown. This paper seeks to remedy this gap by formulating and tackling the scenario sampling safety assurance problem from a set invariance perspective. First, a novel conceptual equivalence is drawn between the scenario sampling safety assurance problem and the data-driven robustly controlled forward invariant set validation and quantification problem. This paper then provides a series of resolution complete and probabilistic complete solutions with finite-sampling analyses for the safety validation problem that authenticates a given ODD. On the other hand, the quantification problem escalates the validation challenge and starts looking for a safe sub-domain of a particular property. This inspires various algorithms that are provably probabilistic incomplete, probabilistic complete but sub-optimal, and asymptotically optimal. Finally, the proposed asymptotically optimal scenario sampling safety quantification algorithm is also empirically demonstrated through simulation experiments.
ROMar 29, 2021
Robust Feedback Motion Policy Design Using Reinforcement Learning on a 3D Digit Bipedal RobotGuillermo A. Castillo, Bowen Weng, Wei Zhang et al.
In this paper, a hierarchical and robust framework for learning bipedal locomotion is presented and successfully implemented on the 3D biped robot Digit built by Agility Robotics. We propose a cascade-structure controller that combines the learning process with intuitive feedback regulations. This design allows the framework to realize robust and stable walking with a reduced-dimension state and action spaces of the policy, significantly simplifying the design and reducing the sampling efficiency of the learning method. The inclusion of feedback regulation into the framework improves the robustness of the learned walking gait and ensures the success of the sim-to-real transfer of the proposed controller with minimal tuning. We specifically present a learning pipeline that considers hardware-feasible initial poses of the robot within the learning process to ensure the initial state of the learning is replicated as close as possible to the initial state of the robot in hardware experiments. Finally, we demonstrate the feasibility of our method by successfully transferring the learned policy in simulation to the Digit robot hardware, realizing sustained walking gaits under external force disturbances and challenging terrains not included during the training process. To the best of our knowledge, this is the first time a learning-based policy is transferred successfully to the Digit robot in hardware experiments without using dynamic randomization or curriculum learning.
ROSep 25, 2020
A Modeled Approach for Online Adversarial Test of Operational Vehicle Safety (extended version)Linda Capito, Bowen Weng, Umit Ozguner et al.
The scenario-based testing of operational vehicle safety presents a set of principal other vehicle (POV) trajectories that seek to force the subject vehicle (SV) into a certain safety-critical situation. Current scenarios are mostly (i) statistics-driven: inspired by human driver crash data, (ii) deterministic: POV trajectories are pre-determined and are independent of SV responses, and (iii) overly simplified: defined over a finite set of actions performed at the abstracted motion planning level. Such scenario-based testing (i) lacks severity guarantees, (ii) has predefined maneuvers making it easy for an SV with intelligent driving policies to game the test, and (iii) is inefficient in producing safety-critical instances with limited and expensive testing effort. We propose a model-driven online feedback control policy for multiple POVs which propagates efficient adversarial trajectories while respecting traffic rules and other concerns formulated as an admissible state-action space. The approach is formulated in an anchor-template hierarchy structure, with the template model planning inducing a theoretical SV capturing guarantee under standard assumptions. The planned adversarial trajectory is then tracked by a lower-level controller applied to the full-system or the anchor model. The effectiveness of the methodology is illustrated through various simulated examples with the SV controlled by either parameterized self-driving policies or human drivers.
ROAug 2, 2020
Velocity Regulation of 3D Bipedal Walking Robots with Uncertain Dynamics Through Adaptive Neural Network ControllerGuillermo A. Castillo, Bowen Weng, Terrence C. Stewart et al.
This paper presents a neural-network based adaptive feedback control structure to regulate the velocity of 3D bipedal robots under dynamics uncertainties. Existing Hybrid Zero Dynamics (HZD)-based controllers regulate velocity through the implementation of heuristic regulators that do not consider model and environmental uncertainties, which may significantly affect the tracking performance of the controllers. In this paper, we address the uncertainties in the robot dynamics from the perspective of the reduced dimensional representation of virtual constraints and propose the integration of an adaptive neural network-based controller to regulate the robot velocity in the presence of model parameter uncertainties. The proposed approach yields improved tracking performance under dynamics uncertainties. The shallow adaptive neural network used in this paper does not require training a priori and has the potential to be implemented on the real-time robotic controller. A comparative simulation study of a 3D Cassie robot is presented to illustrate the performance of the proposed approach under various scenarios.
LGJul 30, 2020
Momentum Q-learning with Finite-Sample Convergence GuaranteeBowen Weng, Huaqing Xiong, Lin Zhao et al.
Existing studies indicate that momentum ideas in conventional optimization can be used to improve the performance of Q-learning algorithms. However, the finite-sample analysis for momentum-based Q-learning algorithms is only available for the tabular case without function approximations. This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee. Specifically, we propose the MomentumQ algorithm, which integrates the Nesterov's and Polyak's momentum schemes, and generalizes the existing momentum-based Q-learning algorithms. For the infinite state-action space case, we establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling. In particular, we characterize the finite-sample convergence rate which is provably faster than the vanilla Q-learning. This is the first finite-sample analysis for momentum-based Q-learning algorithms with function approximations. For the tabular case under synchronous sampling, we also obtain a finite-sample convergence rate that is slightly better than the SpeedyQ \citep{azar2011speedy} when choosing a special family of step sizes. Finally, we demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
OCJul 15, 2020
Analysis of Q-learning with Adaptation and Momentum Restart for Gradient DescentBowen Weng, Huaqing Xiong, Yingbin Liang et al.
Existing convergence analyses of Q-learning mostly focus on the vanilla stochastic gradient descent (SGD) type of updates. Despite the Adaptive Moment Estimation (Adam) has been commonly used for practical Q-learning algorithms, there has not been any convergence guarantee provided for Q-learning with such type of updates. In this paper, we first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update (a commonly adopted alternative of Adam for theoretical analysis). To further improve the performance, we propose to incorporate the momentum restart scheme to Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm. The convergence rate of Q-AMSGradR is also established. Our experiments on a linear quadratic regulator problem show that the two proposed Q-learning algorithms outperform the vanilla Q-learning with SGD updates. The two algorithms also exhibit significantly better performance than the DQN learning method over a batch of Atari 2600 games.
ROMay 20, 2020
Model Predictive Instantaneous Safety Metric for Evaluation of Automated Driving SystemsBowen Weng, Sughosh J. Rao, Eeshan Deosthale et al.
Vehicles with Automated Driving Systems (ADS) operate in a high-dimensional continuous system with multi-agent interactions. This continuous system features various types of traffic agents (non-homogeneous) governed by continuous-motion ordinary differential equations (differential-drive). Each agent makes decisions independently that may lead to conflicts with the subject vehicle (SV), as well as other participants (non-cooperative). A typical vehicle safety evaluation procedure that uses various safety-critical scenarios and observes resultant collisions (or near collisions), is not sufficient enough to evaluate the performance of the ADS in terms of operational safety status maintenance. In this paper, we introduce a Model Predictive Instantaneous Safety Metric (MPrISM), which determines the safety status of the SV, considering the worst-case safety scenario for a given traffic snapshot. The method then analyzes the SV's closeness to a potential collision within a certain evaluation time period. The described metric induces theoretical guarantees of safety in terms of the time to collision under standard assumptions. Through formulating the solution as a series of minimax quadratic optimization problems of a specific structure, the method is tractable for real-time safety evaluation applications. Its capabilities are demonstrated with synthesized examples and cases derived from real-world tests.
ROOct 24, 2019
Reciprocal Collision Avoidance for General Nonlinear Agents using Reinforcement LearningHao Li, Bowen Weng, Abhishek Gupta et al.
Finding feasible and collision-free paths for multiple nonlinear agents is challenging in the decentralized scenarios due to limited available information of other agents and complex dynamics constraints. In this paper, we propose a fast multi-agent collision avoidance algorithm for general nonlinear agents with continuous action space, where each agent observes only positions and velocities of nearby agents. To reduce online computation, we first decompose the multi-agent scenario and solve a two agents collision avoidance problem using reinforcement learning (RL). When extending the trained policy to a multi-agent problem, safety is ensured by introducing the optimal reciprocal collision avoidance (ORCA) as linear constraints and the overall collision avoidance action could be found through simple convex optimization. Most existing RL-based multi-agent collision avoidance algorithms rely on the direct control of agent velocities. In sharp contrasts, our approach is applicable to general nonlinear agents. Realistic simulations based on nonlinear bicycle agent models are performed with various challenging scenarios, indicating a competitive performance of the proposed method in avoiding collisions, congestion and deadlock with smooth trajectories.
OCOct 21, 2019
History-Gradient Aided Batch Size Adaptation for Variance Reduced AlgorithmsKaiyi Ji, Zhe Wang, Bowen Weng et al.
Variance-reduced algorithms, although achieve great theoretical performance, can run slowly in practice due to the periodic gradient estimation with a large batch of data. Batch-size adaptation thus arises as a promising approach to accelerate such algorithms. However, existing schemes either apply prescribed batch-size adaption rule or exploit the information along optimization path via additional backtracking and condition verification steps. In this paper, we propose a novel scheme, which eliminates backtracking line search but still exploits the information along optimization path by adapting the batch size via history stochastic gradients. We further theoretically show that such a scheme substantially reduces the overall complexity for popular variance-reduced algorithms SVRG and SARAH/SPIDER for both conventional nonconvex optimization and reinforcement learning problems. To this end, we develop a new convergence analysis framework to handle the dependence of the batch size on history stochastic gradients. Extensive experiments validate the effectiveness of the proposed batch-size adaptation scheme.
ROOct 3, 2019
Hybrid Zero Dynamics Inspired Feedback Control Policy Design for 3D Bipedal Locomotion using Reinforcement LearningGuillermo A. Castillo, Bowen Weng, Wei Zhang et al.
This paper presents a novel model-free reinforcement learning (RL) framework to design feedback control policies for 3D bipedal walking. Existing RL algorithms are often trained in an end-to-end manner or rely on prior knowledge of some reference joint trajectories. Different from these studies, we propose a novel policy structure that appropriately incorporates physical insights gained from the hybrid nature of the walking dynamics and the well-established hybrid zero dynamics approach for 3D bipedal walking. As a result, the overall RL framework has several key advantages, including lightweight network structure, short training time, and less dependence on prior knowledge. We demonstrate the effectiveness of the proposed method on Cassie, a challenging 3D bipedal robot. The proposed solution produces stable limit walking cycles that can track various walking speed in different directions. Surprisingly, without specifically trained with disturbances to achieve robustness, it also performs robustly against various adversarial forces applied to the torso towards both the forward and the backward directions.
LGMay 7, 2019
Accelerated Target Updates for Q-learningBowen Weng, Huaqing Xiong, Wei Zhang
This paper studies accelerations in Q-learning algorithms. We propose an accelerated target update scheme by incorporating the historical iterates of Q functions. The idea is conceptually inspired by the momentum-based accelerated methods in the optimization theory. Conditions under which the proposed accelerated algorithms converge are established. The algorithms are validated using commonly adopted testing problems in reinforcement learning, including the FrozenLake grid world game, two discrete-time LQR problems from the Deepmind Control Suite, and the Atari 2600 games. Simulation results show that the proposed accelerated algorithms can improve the convergence performance compared with the vanilla Q-learning algorithm.
ROOct 3, 2018
Reinforcement Learning Meets Hybrid Zero Dynamics: A Case Study for RABBITGuillermo A. Castillo, Bowen Weng, Ayonga Hereid et al.
The design of feedback controllers for bipedal robots is challenging due to the hybrid nature of its dynamics and the complexity imposed by high-dimensional bipedal models. In this paper, we present a novel approach for the design of feedback controllers using Reinforcement Learning (RL) and Hybrid Zero Dynamics (HZD). Existing RL approaches for bipedal walking are inefficient as they do not consider the underlying physics, often requires substantial training, and the resulting controller may not be applicable to real robots. HZD is a powerful tool for bipedal control with local stability guarantees of the walking limit cycles. In this paper, we propose a non traditional RL structure that embeds the HZD framework into the policy learning. More specifically, we propose to use RL to find a control policy that maps from the robot's reduced order states to a set of parameters that define the desired trajectories for the robot's joints through the virtual constraints. Then, these trajectories are tracked using an adaptive PD controller. The method results in a stable and robust control policy that is able to track variable speed within a continuous interval. Robustness of the policy is evaluated by applying external forces to the torso of the robot. The proposed RL framework is implemented and demonstrated in OpenAI Gym with the MuJoCo physics engine based on the well-known RABBIT robot model.