Multi-modal Scene-compliant User Intention Estimation in Navigation
This addresses the need for safer and more intuitive shared control in mobile agents like wheelchairs, though it appears incremental as it builds on existing GAN and LSTM methods.
The paper tackles the problem of predicting user intentions in mobile vehicle navigation by proposing a multi-modal framework that combines past trajectory data with visual traversability information. The approach reduces prediction error compared to existing methods like Social-GAN, demonstrating effectiveness even with small, un-annotated datasets.
A multi-modal framework to generate user intention distributions when operating a mobile vehicle is proposed in this work. The model learns from past observed trajectories and leverages traversability information derived from the visual surroundings to produce a set of future trajectories, suitable to be directly embedded into a perception-action shared control strategy on a mobile agent, or as a safety layer to supervise the prudent operation of the vehicle. We base our solution on a conditional Generative Adversarial Network with Long-Short Term Memory cells to capture trajectory distributions conditioned on past trajectories, further fused with traversability probabilities derived from visual segmentation with a Convolutional Neural Network. The proposed data-driven framework results in a significant reduction in error of the predicted trajectories (versus the ground truth) from comparable strategies in the literature (e.g. Social-GAN) that fail to account for information other than the agent's past history. Experiments were conducted on a dataset collected with a custom wheelchair model built onto the open-source urban driving simulator CARLA, proving also that the proposed framework can be used with a small, un-annotated dataset.