Curriculum Learning for Vision-and-Language Navigation
This addresses the challenge of inefficient training in embodied AI navigation for researchers, though it is incremental as it builds on existing VLN methods.
The paper tackles the problem of ignoring sample difficulty distribution in Vision-and-Language Navigation (VLN), which degrades agent performance, by proposing a curriculum-based training paradigm that re-arranges the R2R dataset for curriculum training, resulting in significant improvements in performance, generalizability, and training efficiency for state-of-the-art agents without added model complexity.
Vision-and-Language Navigation (VLN) is a task where an agent navigates in an embodied indoor environment under human instructions. Previous works ignore the distribution of sample difficulty and we argue that this potentially degrade their agent performance. To tackle this issue, we propose a novel curriculum-based training paradigm for VLN tasks that can balance human prior knowledge and agent learning progress about training samples. We develop the principle of curriculum design and re-arrange the benchmark Room-to-Room (R2R) dataset to make it suitable for curriculum training. Experiments show that our method is model-agnostic and can significantly improve the performance, the generalizability, and the training efficiency of current state-of-the-art navigation agents without increasing model complexity.