Generating Contextually-Relevant Navigation Instructions for Blind and Low Vision People
This addresses navigation challenges for blind and low-vision people, but it is incremental as it builds on existing grounded instruction generation methods.
The paper tackled the problem of generating navigation instructions for blind and low-vision individuals in unfamiliar environments, and the result showed that large pretrained language models can produce correct and useful instructions, as demonstrated through a sighted user study and insights from BLV users.
Navigating unfamiliar environments presents significant challenges for blind and low-vision (BLV) individuals. In this work, we construct a dataset of images and goals across different scenarios such as searching through kitchens or navigating outdoors. We then investigate how grounded instruction generation methods can provide contextually-relevant navigational guidance to users in these instances. Through a sighted user study, we demonstrate that large pretrained language models can produce correct and useful instructions perceived as beneficial for BLV users. We also conduct a survey and interview with 4 BLV users and observe useful insights on preferences for different instructions based on the scenario.