69.7ROMay 27
World Models for Robotic Manipulation: A SurveyFangyuan Wang, Ziyuan Wang, Guorui Pei et al.
Robotic manipulation depends on the ability to anticipate how actions reshape objects, contacts, and scene geometry before execution. Learned world models provide this capability by predicting task-relevant future evolution under robot intervention, yet the term now spans latent dynamics models, action-conditioned video generators, three- and four-dimensional scene predictors, physics-informed simulators, and predictive modules inside vision-language-action systems. This breadth has fragmented the literature and obscured the design choices that matter for manipulation. We survey world models for robotic manipulation through three questions: what future representation is predicted, how prediction is connected to action, and when prediction is used in the robot-learning pipeline. We operationally define a world model as an action-conditioned predictive system and distinguish it from perception modules, inverse models, policies, rewards, and value functions. We then organize existing work into five representation families, develop a functional taxonomy that separates integrated prediction-action models from explicit predictive planners, and characterize infrastructure roles including synthetic experience generation, candidate filtering, search-based evaluation, learned environments, and outcome verification. We further map these roles across pretraining, post-training, and inference adaptation, review 34 manipulation datasets, and synthesize evaluation protocols for predictive fidelity, task performance, and simulator reliability. This survey shows that world models are evolving from task-specific dynamics predictors into predictive infrastructure for robot learning, while exposing open challenges in contact modeling, hallucination control, action alignment, and benchmarking under closed-loop use.
ROOct 16, 2020
Hyperparameter Auto-tuning in Self-Supervised Robotic LearningJiancong Huang, Juan Rojas, Matthieu Zimmer et al.
Policy optimization in reinforcement learning requires the selection of numerous hyperparameters across different environments. Fixing them incorrectly may negatively impact optimization performance leading notably to insufficient or redundant learning. Insufficient learning (due to convergence to local optima) results in under-performing policies whilst redundant learning wastes time and resources. The effects are further exacerbated when using single policies to solve multi-task learning problems. Observing that the Evidence Lower Bound (ELBO) used in Variational Auto-Encoders correlates with the diversity of image samples, we propose an auto-tuning technique based on the ELBO for self-supervised reinforcement learning. Our approach can auto-tune three hyperparameters: the replay buffer size, the number of policy gradient updates during each epoch, and the number of exploration steps during each epoch. We use a state-of-the-art self-supervised robot learning framework (Reinforcement Learning with Imagined Goals (RIG) using Soft Actor-Critic) as baseline for experimental verification. Experiments show that our method can auto-tune online and yields the best performance at a fraction of the time and computational resources. Code, video, and appendix for simulated and real-robot experiments can be found at the project page \url{www.JuanRojas.net/autotune}.
ROSep 12, 2018
Dynamic Interaction Probabilistic Movement PrimitivesShuangda Duan, Longxin Chen, Hongmin Wu et al.
Human-robot collaboration is on the rise. Robots need to increasingly improve the efficiency and smoothness with which they assist humans by properly anticipating a human's intention. To do so, prediction models need to increase their accuracy and responsiveness. This work builds on top of Interaction Movement Primitives with phase estimation and re-formulates the framework to use dynamic human-motion observations which constantly update anticipatory motions. The original framework only considers a single fixed-duration static human observation which is used to perform only one anticipatory motion. Dynamic observations, with built-in phase estimation, yield a series of updated robot motion distributions. Co-activation is performed between the existing and newest most probably robot motion distribution. This results in smooth anticipatory robot motions that are highly accurate and with enhanced responsiveness.
ROSep 11, 2018
Endowing Robots with Longer-term Autonomy by Recovering from External Disturbances in Manipulation through Grounded Anomaly Classification and Recovery PoliciesHongmin Wu, Shuangqi Luo, Longxin Chen et al.
Robot manipulation is increasingly poised to interact with humans in co-shared workspaces. Despite increasingly robust manipulation and control algorithms, failure modes continue to exist whenever models do not capture the dynamics of the unstructured environment. To obtain longer-term horizons in robot automation, robots must develop introspection and recovery abilities. We contribute a set of recovery policies to deal with anomalies produced by external disturbances as well as anomaly classification through the use of non-parametric statistics with memoized variational inference with scalable adaptation. A recovery critic stands atop of a tightly-integrated, graph-based online motion-generation and introspection system that resolves a wide range of anomalous situations. Policies, skills, and introspection models are learned incrementally and contextually in a task. Two task-level recovery policies: re-enactment and adaptation resolve accidental and persistent anomalies respectively. The introspection system uses non-parametric priors along with Markov jump linear systems and memoized variational inference with scalable adaptation to learn a model from the data. Extensive real-robot experimentation with various strenuous anomalous conditions is induced and resolved at different phases of a task and in different combinations. The system executes around-the-clock introspection and recovery and even elicited self-recovery when misclassifications occurred.
ROSep 24, 2017
Fast, Robust, and Versatile Event Detection through HMM Belief State Gradient MeasuresShuangqi Luo, Hongmin Wu, Hongbin Lin et al.
Event detection is a critical feature in data-driven systems as it assists with the identification of nominal and anomalous behavior. Event detection is increasingly relevant in robotics as robots operate with greater autonomy in increasingly unstructured environments. In this work, we present an accurate, robust, fast, and versatile measure for skill and anomaly identification. A theoretical proof establishes the link between the derivative of the log-likelihood of the HMM filtered belief state and the latest emission probabilities. The key insight is the inverse relationship in which gradient analysis is used for skill and anomaly identification. Our measure showed better performance across all metrics than related state-of-the art works. The result is broadly applicable to domains that use HMMs for event detection.
ROAug 1, 2017
Recovering from External Disturbances in Online Manipulation through State-Dependent Revertive Recovery PoliciesHongmin Wu, Hongbin Lin, Shuangqi Luo et al.
Robots are increasingly entering uncertain and unstructured environments. Within these, robots are bound to face unexpected external disturbances like accidental human or tool collisions. Robots must develop the capacity to respond to unexpected events. That is not only identifying the sudden anomaly, but also deciding how to handle it. In this work, we contribute a recovery policy that allows a robot to recovery from various anomalous scenarios across different tasks and conditions in a consistent and robust fashion. The system organizes tasks as a sequence of nodes composed of internal modules such as motion generation and introspection. When an introspection module flags an anomaly, the recovery strategy is triggered and reverts the task execution by selecting a target node as a function of a state dependency chart. The new skill allows the robot to overcome the effects of the external disturbance and conclude the task. Our system recovers from accidental human and tool collisions in a number of tasks. Of particular importance is the fact that we test the robustness of the recovery system by triggering anomalies at each node in the task graph showing robust recovery everywhere in the task. We also trigger multiple and repeated anomalies at each of the nodes of the task showing that the recovery system can consistently recover anywhere in the presence of strong and pervasive anomalous conditions. Robust recovery systems will be key enablers for long-term autonomy in robot systems. Supplemental info including code, data, graphs, and result analysis can be found at [1].
ROMay 24, 2017
Robot Introspection with Bayesian Nonparametric Vector Autoregressive Hidden Markov ModelsHongmin Wu, Hongbin Lin, Yisheng Guan et al.
Robot introspection, as opposed to anomaly detection typical in process monitoring, helps a robot understand what it is doing at all times. A robot should be able to identify its actions not only when failure or novelty occurs, but also as it executes any number of sub-tasks. As robots continue their quest of functioning in unstructured environments, it is imperative they understand what is it that they are actually doing to render them more robust. This work investigates the modeling ability of Bayesian nonparametric techniques on Markov Switching Process to learn complex dynamics typical in robot contact tasks. We study whether the Markov switching process, together with Bayesian priors can outperform the modeling ability of its counterparts: an HMM with Bayesian priors and without. The work was tested in a snap assembly task characterized by high elastic forces. The task consists of an insertion subtask with very complex dynamics. Our approach showed a stronger ability to generalize and was able to better model the subtask with complex dynamics in a computationally efficient way. The modeling technique is also used to learn a growing library of robot skills, one that when integrated with low-level control allows for robot online decision making.