RO LGOct 8, 2021

Towards Sample-efficient Apprenticeship Learning from Suboptimal Demonstration

Letian Chen, Rohan Paleja, Matthew Gombolay

arXiv:2110.04347v15.31 citationsh-index: 42

Originality Incremental advance

AI Analysis

This addresses the challenge of enabling non-expert users to teach robots effectively, though it appears incremental as it builds on prior work like SSRR.

The paper tackles the problem of learning from suboptimal demonstrations in robotics by proposing S3RR, a systematic alternative to noise-injection methods, and finds it achieves comparable or better reward correlation with ground-truth compared to state-of-the-art frameworks.

Learning from Demonstration (LfD) seeks to democratize robotics by enabling non-roboticist end-users to teach robots to perform novel tasks by providing demonstrations. However, as demonstrators are typically non-experts, modern LfD techniques are unable to produce policies much better than the suboptimal demonstration. A previously-proposed framework, SSRR, has shown success in learning from suboptimal demonstration but relies on noise-injected trajectories to infer an idealized reward function. A random approach such as noise-injection to generate trajectories has two key drawbacks: 1) Performance degradation could be random depending on whether the noise is applied to vital states and 2) Noise-injection generated trajectories may have limited suboptimality and therefore will not accurately represent the whole scope of suboptimality. We present Systematic Self-Supervised Reward Regression, S3RR, to investigate systematic alternatives for trajectory degradation. We carry out empirical evaluations and find S3RR can learn comparable or better reward correlation with ground-truth against a state-of-the-art learning from suboptimal demonstration framework.

View on arXiv PDF

Similar