ROMar 16, 2020
Visual Task Progress Estimation with Appearance Invariant Embeddings for Robot Control and PlanningGuilherme Maeda, Joni Väätäinen, Hironori Yoshida
One of the challenges of full autonomy is to have a robot capable of manipulating its current environment to achieve another environment configuration. This paper is a step towards this challenge, focusing on the visual understanding of the task. Our approach trains a deep neural network to represent images as measurable features that are useful to estimate the progress (or phase) of a task. The training uses numerous variations of images of identical tasks when taken under the same phase index. The goal is to make the network sensitive to differences in task progress but insensitive to the appearance of the images. To this end, our method builds upon Time-Contrastive Networks (TCNs) to train a network using only discrete snapshots taken at different stages of a task. A robot can then solve long-horizon tasks by using the trained network to identify the progress of the current task and by iteratively calling a motion planner until the task is solved. We quantify the granularity achieved by the network in two simulated environments. In the first, to detect the number of objects in a scene and in the second to measure the volume of particles in a cup. Our experiments leverage this granularity to make a mobile robot move a desired number of objects into a storage area and to control the amount of pouring in a cup.
LGFeb 1, 2020
Periodic Intra-Ensemble Knowledge Distillation for Reinforcement LearningZhang-Wei Hong, Prabhat Nagarajan, Guilherme Maeda
Off-policy ensemble reinforcement learning (RL) methods have demonstrated impressive results across a range of RL benchmark tasks. Recent works suggest that directly imitating experts' policies in a supervised manner before or during the course of training enables faster policy improvement for an RL agent. Motivated by these recent insights, we propose Periodic Intra-Ensemble Knowledge Distillation (PIEKD). PIEKD is a learning framework that uses an ensemble of policies to act in the environment while periodically sharing knowledge amongst policies in the ensemble through knowledge distillation. Our experiments demonstrate that PIEKD improves upon a state-of-the-art RL method in sample efficiency on several challenging MuJoCo benchmark tasks. Additionally, we perform ablation studies to better understand PIEKD.
RODec 7, 2019
Phase Portraits as Movement Primitives for Fast Humanoid Robot ControlGuilherme Maeda, Okan Koc, Jun Morimoto
Currently, usual approaches for fast robot control are largely reliant on solving online optimal control problems. Such methods are known to be computationally intensive and sensitive to model accuracy. On the other hand, animals plan complex motor actions not only fast but seemingly with little effort even on unseen tasks. This natural sense to infer temporal dynamics and coordination motivates us to approach robot control from a motor skill learning perspective to design fast and computationally light controllers that can be learned autonomously by the robot under mild modeling assumptions. This article introduces Phase Portrait Movement Primitives (PPMP), a primitive that predicts dynamics on a low dimensional phase space which in turn is used to govern the high dimensional kinematics of the task. The stark difference with other primitive formulations is a built-in mechanism for phase prediction in the form of coupled oscillators that replaces model-based state estimators such as Kalman filters. The policy is trained by optimizing the parameters of the oscillators whose output is connected to a kinematic distribution in the form of a phase portrait. The drastic reduction in dimensionality allows us to efficiently train and execute PPMPs on a real human-sized, dual-arm humanoid upper body on a task involving 20 degrees-of-freedom. We demonstrate PPMPs in interactions requiring fast reactions times while generating anticipative pose adaptation in both discrete and cyclic tasks.
ROJul 5, 2018
Optimizing Execution of Dynamic Goal-Directed Robot Movements with Learning ControlOkan Koc, Guilherme Maeda, Jan Peters
Highly dynamic tasks that require large accelerations and precise tracking usually rely on accurate models and/or high gain feedback. While kinematic optimization allows for efficient representation and online generation of hitting trajectories, learning to track such dynamic movements with inaccurate models remains an open problem. In particular, stability issues surrounding the learning performance, in the iteration domain, can prevent the successful implementation of model based learning approaches. To achieve accurate tracking for such tasks in a stable and efficient way, we propose a new adaptive Iterative Learning Control (ILC) algorithm that is implemented efficiently using a recursive approach. Moreover, covariance estimates of model matrices are used to exercise caution during learning. We evaluate the performance of the proposed approach in extensive simulations and in our robotic table tennis platform, where we show how the striking performance of two seven degree of freedom anthropomorphic robot arms can be optimized. Our implementation on the table tennis platform compares favorably with high-gain PD-control, model-free ILC (simple PD feedback type) and model-based ILC without cautious adaptation.