CVFeb 24, 2017

How hard is it to cross the room? -- Training (Recurrent) Neural Networks to steer a UAV

arXiv:1702.07600v138 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of vision-based UAV control for robotics applications, but it is incremental as it builds on existing imitation learning frameworks and focuses on a simulated proof-of-concept.

The authors tackled the problem of training neural networks to steer a UAV using camera input for navigation tasks, achieving successful control with an LSTM network in simulation and comparing training methods like WW-TBPTT and retraining strategies.

This work explores the feasibility of steering a drone with a (recurrent) neural network, based on input from a forward looking camera, in the context of a high-level navigation task. We set up a generic framework for training a network to perform navigation tasks based on imitation learning. It can be applied to both aerial and land vehicles. As a proof of concept we apply it to a UAV (Unmanned Aerial Vehicle) in a simulated environment, learning to cross a room containing a number of obstacles. So far only feedforward neural networks (FNNs) have been used to train UAV control. To cope with more complex tasks, we propose the use of recurrent neural networks (RNN) instead and successfully train an LSTM (Long-Short Term Memory) network for controlling UAVs. Vision based control is a sequential prediction problem, known for its highly correlated input data. The correlation makes training a network hard, especially an RNN. To overcome this issue, we investigate an alternative sampling method during training, namely window-wise truncated backpropagation through time (WW-TBPTT). Further, end-to-end training requires a lot of data which often is not available. Therefore, we compare the performance of retraining only the Fully Connected (FC) and LSTM control layers with networks which are trained end-to-end. Performing the relatively simple task of crossing a room already reveals important guidelines and good practices for training neural control networks. Different visualizations help to explain the behavior learned.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes