NE LGJan 3, 2023

Genetic Imitation Learning by Reward Extrapolation

arXiv:2301.07182v14.91 citationsh-index: 26

Originality Incremental advance

AI Analysis

This work addresses data efficiency and reward estimation for imitation learning practitioners, representing an incremental improvement over existing extrapolation methods.

The paper tackles the problem of imitation learning's data inefficiency and reward estimation accuracy by integrating a Genetic Algorithm to reproduce diverse trajectories, resulting in improved extrapolation accuracy, robustness, and policy performance in Atari and Mujoco domains with limited data.

Imitation learning demonstrates remarkable performance in various domains. However, imitation learning is also constrained by many prerequisites. The research community has done intensive research to alleviate these constraints, such as adding the stochastic policy to avoid unseen states, eliminating the need for action labels, and learning from the suboptimal demonstrations. Inspired by the natural reproduction process, we proposed a method called GenIL that integrates the Genetic Algorithm with imitation learning. The involvement of the Genetic Algorithm improves the data efficiency by reproducing trajectories with various returns and assists the model in estimating more accurate and compact reward function parameters. We tested GenIL in both Atari and Mujoco domains, and the result shows that it successfully outperforms the previous extrapolation methods over extrapolation accuracy, robustness, and overall policy performance when input data is limited.

View on arXiv PDF

Similar