AIMay 12, 2022
A Generalist AgentScott Reed, Konrad Zolna, Emilio Parisotto et al.
Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.
LGMar 31, 2023
Accelerating exploration and representation learning with offline pre-trainingBogdan Mazoure, Jake Bruce, Doina Precup et al.
Sequential decision-making agents struggle with long horizon tasks, since solving them requires multi-step reasoning. Most reinforcement learning (RL) algorithms address this challenge by improved credit assignment, introducing memory capability, altering the agent's intrinsic motivation (i.e. exploration) or its worldview (i.e. knowledge representation). Many of these components could be learned from offline data. In this work, we follow the hypothesis that exploration and representation learning can be improved by separately learning two different models from a single offline dataset. We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward separately from a single collection of human demonstrations can significantly improve the sample efficiency on the challenging NetHack benchmark. We also ablate various components of our experimental setting and highlight crucial insights.
LGFeb 23, 2024
Genie: Generative Interactive EnvironmentsJake Bruce, Michael Dennis, Ashley Edwards et al. · oxford
We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.
LGNov 20, 2019Code
Evaluating task-agnostic exploration for fixed-batch learning of arbitrary future tasksVibhavari Dasagi, Robert Lee, Jake Bruce et al.
Deep reinforcement learning has been shown to solve challenging tasks where large amounts of training experience is available, usually obtained online while learning the task. Robotics is a significant potential application domain for many of these algorithms, but generating robot experience in the real world is expensive, especially when each task requires a lengthy online training procedure. Off-policy algorithms can in principle learn arbitrary tasks from a diverse enough fixed dataset. In this work, we evaluate popular exploration methods by generating robotics datasets for the purpose of learning to solve tasks completely offline without any further interaction in the real world. We present results on three popular continuous control tasks in simulation, as well as continuous control of a high-dimensional real robot arm. Code documenting all algorithms, experiments, and hyper-parameters is available at https://github.com/qutrobotlearning/batchlearning.
CVFeb 27, 2024
Video as the New Language for Real-World Decision MakingSherry Yang, Jacob Walker, Jack Parker-Holder et al. · mit, oxford
Both text and video data are abundant on the internet and support large-scale self-supervised learning through next token or frame prediction. However, they have not been equally leveraged: language models have had significant real-world impact, whereas video generation has remained largely limited to media entertainment. Yet video data captures important information about the physical world that is difficult to express in language. To address this gap, we discuss an under-appreciated opportunity to extend video generation to solve tasks in the real world. We observe how, akin to language, video can serve as a unified interface that can absorb internet knowledge and represent diverse tasks. Moreover, we demonstrate how, like language models, video generation can serve as planners, agents, compute engines, and environment simulators through techniques such as in-context learning, planning and reinforcement learning. We identify major impact opportunities in domains such as robotics, self-driving, and science, supported by recent work that demonstrates how such advanced capabilities in video generation are plausibly within reach. Lastly, we identify key challenges in video generation that mitigate progress. Addressing these challenges will enable video generation models to demonstrate unique value alongside language models in a wider array of AI applications.
AIMay 18, 2023
A Generalist Dynamics Model for ControlIngmar Schubert, Jingwei Zhang, Jake Bruce et al.
We investigate the use of transformer sequence models as dynamics models (TDMs) for control. We find that TDMs exhibit strong generalization capabilities to unseen environments, both in a few-shot setting, where a generalist TDM is fine-tuned with small amounts of data from the target environment, and in a zero-shot setting, where a generalist TDM is applied to an unseen environment without any further training. Here, we demonstrate that generalizing system dynamics can work much better than generalizing optimal behavior directly as a policy. Additional results show that TDMs also perform well in a single-environment learning setting when compared to a number of baseline models. These properties make TDMs a promising ingredient for a foundation model of control.
LGJul 8, 2021
Imitation by Predicting ObservationsAndrew Jaegle, Yury Sulsky, Arun Ahuja et al.
Imitation learning enables agents to reuse and adapt the hard-won expertise of others, offering a solution to several key challenges in learning behavior. Although it is easy to observe behavior in the real-world, the underlying actions may not be accessible. We present a new method for imitation solely from observations that achieves comparable performance to experts on challenging continuous control tasks while also exhibiting robustness in the presence of observations unrelated to the task. Our method, which we call FORM (for "Future Observation Reward Model") is derived from an inverse RL objective and imitates using a model of expert behavior learned by generative modelling of the expert's observations, without needing ground truth actions. We show that FORM performs comparably to a strong baseline IRL method (GAIL) on the DeepMind Control Suite benchmark, while outperforming GAIL in the presence of task-irrelevant features.
LGOct 9, 2019
Ctrl-Z: Recovering from Instability in Reinforcement LearningVibhavari Dasagi, Jake Bruce, Thierry Peynot et al.
When learning behavior, training data is often generated by the learner itself; this can result in unstable training dynamics, and this problem has particularly important applications in safety-sensitive real-world control tasks such as robotics. In this work, we propose a principled and model-agnostic approach to mitigate the issue of unstable learning dynamics by maintaining a history of a reinforcement learning agent over the course of training, and reverting to the parameters of a previous agent whenever performance significantly decreases. We develop techniques for evaluating this performance through statistical hypothesis testing of continued improvement, and evaluate them on a standard suite of challenging benchmark tasks involving continuous control of simulated robots. We show improvements over state-of-the-art reinforcement learning algorithms in performance and robustness to hyperparameters, outperforming DDPG in 5 out of 6 evaluation environments and showing no decrease in performance with TD3, which is known to be relatively stable. In this way, our approach takes an important step towards increasing data efficiency and stability in training for real-world robotic applications.
LGSep 20, 2018
Sim-to-Real Transfer of Robot Learning with Variable Length InputsVibhavari Dasagi, Robert Lee, Serena Mou et al.
Current end-to-end deep Reinforcement Learning (RL) approaches require jointly learning perception, decision-making and low-level control from very sparse reward signals and high-dimensional inputs, with little capability of incorporating prior knowledge. This results in prohibitively long training times for use on real-world robotic tasks. Existing algorithms capable of extracting task-level representations from high-dimensional inputs, e.g. object detection, often produce outputs of varying lengths, restricting their use in RL methods due to the need for neural networks to have fixed length inputs. In this work, we propose a framework that combines deep sets encoding, which allows for variable-length abstract representations, with modular RL that utilizes these representations, decoupling high-level decision making from low-level control. We successfully demonstrate our approach on the robot manipulation task of object sorting, showing that this method can learn effective policies within mere minutes of highly simplified simulation. The learned policies can be directly deployed on a robot without further training, and generalize to variations of the task unseen during training.
ROJul 11, 2018
Learning Deployable Navigation Policies at Kilometer Scale from a Single TraversalJake Bruce, Niko Sünderhauf, Piotr Mirowski et al.
Model-free reinforcement learning has recently been shown to be effective at learning navigation policies from complex image input. However, these algorithms tend to require large amounts of interaction with the environment, which can be prohibitively costly to obtain on robots in the real world. We present an approach for efficiently learning goal-directed navigation policies on a mobile robot, from only a single coverage traversal of recorded data. The navigation agent learns an effective policy over a diverse action space in a large heterogeneous environment consisting of more than 2km of travel, through buildings and outdoor regions that collectively exhibit large variations in visual appearance, self-similarity, and connectivity. We compare pretrained visual encoders that enable precomputation of visual embeddings to achieve a throughput of tens of thousands of transitions per second at training time on a commodity desktop computer, allowing agents to learn from millions of trajectories of experience in a matter of hours. We propose multiple forms of computationally efficient stochastic augmentation to enable the learned policy to generalise beyond these precomputed embeddings, and demonstrate successful deployment of the learned policy on the real robot without fine tuning, despite environmental appearance differences at test time. The dataset and code required to reproduce these results and apply the technique to other datasets and robots is made publicly available at rl-navigation.github.io/deployable.
AINov 28, 2017
One-Shot Reinforcement Learning for Robot Navigation with Interactive ReplayJake Bruce, Niko Suenderhauf, Piotr Mirowski et al.
Recently, model-free reinforcement learning algorithms have been shown to solve challenging problems by learning from extensive interaction with the environment. A significant issue with transferring this success to the robotics domain is that interaction with the real world is costly, but training on limited experience is prone to overfitting. We present a method for learning to navigate, to a fixed goal and in a known environment, on a mobile robot. The robot leverages an interactive world model built from a single traversal of the environment, a pre-trained visual feature encoder, and stochastic environmental augmentation, to demonstrate successful zero-shot transfer under real-world environmental variations without fine-tuning.
CVNov 20, 2017
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environmentsPeter Anderson, Qi Wu, Damien Teney et al.
A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visually-grounded navigation instructions, we present the Matterport3D Simulator -- a large-scale reinforcement learning environment based on real imagery. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings -- the Room-to-Room (R2R) dataset.
ROJun 19, 2017
Look No Further: Adapting the Localization Sensory Window to the Temporal Characteristics of the EnvironmentJake Bruce, Adam Jacobson, Michael Milford
Many localization algorithms use a spatiotemporal window of sensory information in order to recognize spatial locations, and the length of this window is often a sensitive parameter that must be tuned to the specifics of the application. This letter presents a general method for environment-driven variation of the length of the spatiotemporal window based on searching for the most significant localization hypothesis, to use as much context as is appropriate but not more. We evaluate this approach on benchmark datasets using visual and Wi-Fi sensor modalities and a variety of sensory comparison front-ends under in-order and out-of-order traversals of the environment. Our results show that the system greatly reduces the maximum distance traveled without localization compared to a fixed-length approach while achieving competitive localization accuracy, and our proposed method achieves this performance without deployment-time tuning.