CVNov 22, 2023

Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation

arXiv:2311.13209v414.925 citationsh-index: 24Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of efficient online adaptation for embodied agents in vision-and-language navigation, which is incremental as it builds on existing test-time adaptation methods.

The paper tackles the challenge of online Vision-and-Language Navigation, where agents must adapt to unlabeled test samples during instruction execution, by proposing a Fast-Slow Test-Time Adaptation approach that balances frequent and occasional updates to handle dynamic environments, resulting in impressive performance gains on four benchmarks.

The ability to accurately comprehend natural language instructions and navigate to the target location is essential for an embodied agent. Such agents are typically required to execute user instructions in an online manner, leading us to explore the use of unlabeled test samples for effective online model adaptation. However, for online Vision-and-Language Navigation (VLN), due to the intrinsic nature of inter-sample online instruction execution and intra-sample multi-step action decision, frequent updates can result in drastic changes in model parameters, while occasional updates can make the model ill-equipped to handle dynamically changing environments. Therefore, we propose a Fast-Slow Test-Time Adaptation (FSTTA) approach for online VLN by performing joint decomposition-accumulation analysis for both gradients and parameters in a unified framework. Extensive experiments show that our method obtains impressive performance gains on four popular benchmarks. Code is available at https://github.com/Feliciaxyao/ICML2024-FSTTA.

View on arXiv PDF Code

Similar