LG AINov 23, 2022

Masked Autoencoding for Scalable and Generalizable Decision Making

Fangchen Liu, Hao Liu, Aditya Grover, Pieter Abbeel

arXiv:2211.12740v223.658 citationsh-index: 164Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of building generalizable decision-making agents for reinforcement learning, though it appears incremental as it adapts existing masked autoencoding techniques to this domain.

The paper tackles the problem of learning scalable agents for reinforcement learning from large-scale sequential data by introducing Masked Decision Prediction (MaskDP), a self-supervised pretraining method using masked autoencoding on state-action trajectories, which enables zero-shot transfer to new tasks and competitive performance in offline RL with data-efficient finetuning.

We are interested in learning scalable agents for reinforcement learning that can learn from large-scale, diverse sequential data similar to current large vision and language models. To this end, this paper presents masked decision prediction (MaskDP), a simple and scalable self-supervised pretraining method for reinforcement learning (RL) and behavioral cloning (BC). In our MaskDP approach, we employ a masked autoencoder (MAE) to state-action trajectories, wherein we randomly mask state and action tokens and reconstruct the missing data. By doing so, the model is required to infer masked-out states and actions and extract information about dynamics. We find that masking different proportions of the input sequence significantly helps with learning a better model that generalizes well to multiple downstream tasks. In our empirical study, we find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching, and it can zero-shot infer skills from a few example transitions. In addition, MaskDP transfers well to offline RL and shows promising scaling behavior w.r.t. to model size. It is amenable to data-efficient finetuning, achieving competitive results with prior methods based on autoregressive pretraining.

View on arXiv PDF Code

Similar