RO LGJan 5, 2021

An A* Curriculum Approach to Reinforcement Learning for RGBD Indoor Robot Navigation

Kaushik Balakrishnan, Punarjay Chakravarty, Shubham Shrivastava

arXiv:2101.01774v17.38 citations

Originality Incremental advance

AI Analysis

This work is significant for robotics researchers and industry aiming to deploy co-bots and last-mile delivery robots, as it offers a more efficient training strategy for direct-perception DRL navigation policies.

This paper addresses the data inefficiency in training end-to-end Deep Reinforcement Learning (DRL) agents for robot navigation by separating perception and control training. It uses a pre-trained Variational AutoEncoder (VAE) for RGBD compression and a curriculum learning approach guided by A* path planning, resulting in improved performance and decreased training times for the PointNav task in Habitat.

Training robots to navigate diverse environments is a challenging problem as it involves the confluence of several different perception tasks such as mapping and localization, followed by optimal path-planning and control. Recently released photo-realistic simulators such as Habitat allow for the training of networks that output control actions directly from perception: agents use Deep Reinforcement Learning (DRL) to regress directly from the camera image to a control output in an end-to-end fashion. This is data-inefficient and can take several days to train on a GPU. Our paper tries to overcome this problem by separating the training of the perception and control neural nets and increasing the path complexity gradually using a curriculum approach. Specifically, a pre-trained twin Variational AutoEncoder (VAE) is used to compress RGBD (RGB & depth) sensing from an environment into a latent embedding, which is then used to train a DRL-based control policy. A*, a traditional path-planner is used as a guide for the policy and the distance between start and target locations is incrementally increased along the A* route, as training progresses. We demonstrate the efficacy of the proposed approach, both in terms of increased performance and decreased training times for the PointNav task in the Habitat simulation environment. This strategy of improving the training of direct-perception based DRL navigation policies is expected to hasten the deployment of robots of particular interest to industry such as co-bots on the factory floor and last-mile delivery robots.

View on arXiv PDF

Similar