Yebin Wang

h-index9

9papers

359citations

Novelty53%

AI Score56

Ranked #21,110 of 201,326 authors (top 10%)#27 in SY (top 3%)

9 Papers

95.0CVJun 2Code

VLESA: Vision-Language Embodied Safety Agent for Human Activity Monitoring

Hanjiang Hu, Yiyuan Pan, Jiaxing Li et al.

As AI systems increasingly assist humans in physical tasks, ensuring safety becomes paramount -- physical actions carry immediate and irreversible consequences that digital errors do not. We introduce the Vision-Language Embodied Safety Agent (VLESA), a framework that monitors human activities from egocentric video and triggers real-time safety interventions when dangerous actions are predicted. VLESA addresses intent-dependent safety where identical actions can be safe or dangerous depending on context. A dataset pairing egocentric frames with goal-conditioned safety annotations is introduced, enabling a goal-conditioned safety Q-filter trained via GRPO that evaluates actions with respect to inferred intent without retraining. On top of that, an intent-action prediction agent is proposed to jointly infer goals and predict future actions from video. On the ASIMOV-2.0 benchmark, VLESA achieves higher intervention accuracy at the exact ground-truth frame compared to baselines, while the GRPO-trained Q-filter improves action safety by over 41 percentage points through goal-conditioned constrained decoding. Code is available at https://github.com/HanjiangHu/VLESA.

SYDec 14, 2017

Nonlinear Bayesian Estimation: From Kalman Filtering to a Broader Horizon

Huazhen Fang, Ning Tian, Yebin Wang et al.

This article presents an up-to-date tutorial review of nonlinear Bayesian estimation. State estimation for nonlinear systems has been a challenge encountered in a wide range of engineering fields, attracting decades of research effort. To date, one of the most promising and popular approaches is to view and address the problem from a Bayesian probabilistic perspective, which enables estimation of the unknown state variables by tracking their probabilistic distribution or statistics (e.g., mean and covariance) conditioned on the system's measurement data. This article offers a systematic introduction of the Bayesian state estimation framework and reviews various Kalman filtering (KF) techniques, progressively from the standard KF for linear systems to extended KF, unscented KF and ensemble KF for nonlinear systems. It also overviews other prominent or emerging Bayesian estimation methods including the Gaussian filtering, Gaussian-sum filtering, particle filtering and moving horizon estimation and extends the discussion of state estimation forward to more complicated problems such as simultaneous state and parameter/input estimation.

31.5ROMay 21

Verified Task-Space Motion Planning Under Joint-Space Constraints

Hanjiang Hu, Changliu Liu, Yebin Wang

Reactive task-space planners such as Bug2 operate with fixed Cartesian step sizes and are unaware of the manipulator's joint-angle limits. When the Jacobian is poorly conditioned, even small Cartesian steps can demand joint changes that exceed admissible bounds; clipping the joints to their limits causes tracking drift and can prevent goal reaching entirely. We address this by computing, at each planning step, the largest Cartesian hyperrectangle that is \emph{certifiably reachable} under joint displacement bounds. Using a second-order polynomial approximation of the inverse kinematics and the S-procedure, we formulate a small semidefinite program whose solution yields the certified half-width~$λ^\star$. An equivalent bisection procedure exploiting the quadratic structure solves the certification in sub-millisecond time. Integrating this certificate with Bug2 yields a planner whose step size adapts to local kinematic conditioning. In a statistical evaluation over 94 adversarial scenarios spanning six joint-limit settings, the SOS-verified planner achieves \emph{zero} joint-limit violations with a 100\% goal-reaching rate, whereas a standard Bug2 planner violates joint limits in 6--11\% of steps and fails to reach the goal in up to 18\% of scenarios.

6.6SYMar 26

Accelerating Bayesian Optimization for Nonlinear State-Space System Identification with Application to Lithium-Ion Batteries

Hao Tu, Jackson Fogelquist, Iman Askari et al.

This paper studies system identification for nonlinear state-space models, a problem that arises across many fields yet remains challenging in practice. Focusing on maximum likelihood estimation, we employ Bayesian optimization (BayesOpt) to address this problem by leveraging its derivative-free global search capability enabled by surrogate modeling of the likelihood function. Despite these advantages, standard BayesOpt often suffers from slow convergence, high computational cost, and practical difficulty in attaining global optima under limited computational budgets, especially for high-dimensional nonlinear models with many unknown parameters. To overcome these limitations, we propose an accelerated BayesOpt framework that integrates BayesOpt with the Nelder--Mead method. Heuristics-based, the Nelder--Mead method provides fast local search, thereby assisting BayesOpt when the surrogate model lacks fidelity or when over-exploration occurs in broad parameter spaces. The proposed framework incorporates a principled strategy to coordinate the two methods, effectively combining their complementary strengths. The resulting hybrid approach significantly improves both convergence speed and computational efficiency while maintaining strong global search performance. In addition, we leverage an implicit particle filtering method to enable accurate and efficient likelihood evaluation. We validate the proposed framework on the identification of the BattX model for lithium-ion batteries, which features ten state dimensions, 18 unknown parameters, and strong nonlinearity. Both simulation and experimental results demonstrate the effectiveness of the proposed approach as well as its advantages over alternative methods.

LGSep 24, 2025

Training Task Reasoning LLM Agents for Multi-turn Task Planning via Single-turn Reinforcement Learning

Hanjiang Hu, Changliu Liu, Na Li et al.

Large Language Models (LLMs) have demonstrated remarkable capabilities in knowledge acquisition, reasoning, and tool use, making them promising candidates for autonomous agent applications. However, training LLM agents for complex multi-turn task planning faces significant challenges, including sparse episode-wise rewards, credit assignment across long horizons, and the computational overhead of reinforcement learning in multi-turn interaction settings. To this end, this paper introduces a novel approach that transforms multi-turn task planning into single-turn task reasoning problems, enabling efficient policy optimization through Group Relative Policy Optimization (GRPO) with dense and verifiable reward from expert trajectories. Our theoretical analysis shows that GRPO improvement on single-turn task reasoning results in higher multi-turn success probability under the minimal turns, as well as the generalization to subtasks with shorter horizons. Experimental evaluation on the complex task planning benchmark demonstrates that our 1.5B parameter model trained with single-turn GRPO achieves superior performance compared to larger baseline models up to 14B parameters, with success rates of 70% for long-horizon planning tasks with over 30 steps. We also theoretically and empirically validate the strong cross-task generalizability that the models trained on complex tasks can lead to the successful completion of all simpler subtasks.

ROAug 20, 2025

Offline Imitation Learning upon Arbitrary Demonstrations by Pre-Training Dynamics Representations

Haitong Ma, Bo Dai, Zhaolin Ren et al.

Limited data has become a major bottleneck in scaling up offline imitation learning (IL). In this paper, we propose enhancing IL performance under limited expert data by introducing a pre-training stage that learns dynamics representations, derived from factorizations of the transition dynamics. We first theoretically justify that the optimal decision variable of offline IL lies in the representation space, significantly reducing the parameters to learn in the downstream IL. Moreover, the dynamics representations can be learned from arbitrary data collected with the same dynamics, allowing the reuse of massive non-expert data and mitigating the limited data issues. We present a tractable loss function inspired by noise contrastive estimation to learn the dynamics representations at the pre-training stage. Experiments on MuJoCo demonstrate that our proposed algorithm can mimic expert policies with as few as a single trajectory. Experiments on real quadrupeds show that we can leverage pre-trained dynamics representations from simulator data to learn to walk from a few real-world demonstrations.

CEDec 24, 2021

Integrating Physics-Based Modeling with Machine Learning for Lithium-Ion Batteries

Hao Tu, Scott Moura, Yebin Wang et al.

Mathematical modeling of lithium-ion batteries (LiBs) is a primary challenge in advanced battery management. This paper proposes two new frameworks to integrate physics-based models with machine learning to achieve high-precision modeling for LiBs. The frameworks are characterized by informing the machine learning model of the state information of the physical model, enabling a deep integration between physics and machine learning. Based on the frameworks, a series of hybrid models are constructed, through combining an electrochemical model and an equivalent circuit model, respectively, with a feedforward neural network. The hybrid models are relatively parsimonious in structure and can provide considerable voltage predictive accuracy under a broad range of C-rates, as shown by extensive simulations and experiments. The study further expands to conduct aging-aware hybrid modeling, leading to the design of a hybrid model conscious of the state-of-health to make prediction. The experiments show that the model has high voltage predictive accuracy throughout a LiB's cycle life.

SYJul 3, 2019

Safe Approximate Dynamic Programming Via Kernelized Lipschitz Estimation

Ankush Chakrabarty, Devesh K. Jha, Gregery T. Buzzard et al.

We develop a method for obtaining safe initial policies for reinforcement learning via approximate dynamic programming (ADP) techniques for uncertain systems evolving with discrete-time dynamics. We employ kernelized Lipschitz estimation and semidefinite programming for computing admissible initial control policies with provably high probability. Such admissible controllers enable safe initialization and constraint enforcement while providing exponential stability of the equilibrium of the closed-loop system.

SYMar 31, 2019

Robust Extended Kalman Filtering for Systems with Measurement Outliers

Huazhen Fang, Mulugeta A. Haile, Yebin Wang

Outliers can contaminate the measurement process of many nonlinear systems, which can be caused by sensor errors, model uncertainties, change in ambient environment, data loss or malicious cyber attacks. When the extended Kalman filter (EKF) is applied to such systems for state estimation, the outliers can seriously reduce the estimation accuracy. This paper proposes an innovation saturation mechanism to modify the EKF toward building robustness against outliers. This mechanism applies a saturation function to the innovation process that the EKF leverages to correct the state estimation. As such, when an outlier occurs, the distorting innovation is saturated and thus prevented from damaging the state estimation. The mechanism features an adaptive adjustment of the saturation bound. The design leads to the development robust EKF approaches for continuous- and discrete-time systems. They are proven to be capable of generating bounded-error estimation in the presence of bounded outlier disturbances. An application study about mobile robot localization is presented, with the numerical simulation showing the efficacy of the proposed design. Compared to existing methods, the proposed approaches can effectively reject outliers of various magnitudes, types and durations, at significant computational efficiency and without requiring measurement redundancy.