Invariance is Key to Generalization: Examining the Role of Representation in Sim-to-Real Transfer for Visual Navigation
This addresses the problem of sim-to-real transfer for visual navigation in robotics, offering a scalable approach that improves with more pre-training data, though it is incremental in building on existing representation ideas.
The paper tackles the challenge of generalization in robot control by proposing that rich and invariant representations are key, and demonstrates that using depth and semantic information enables a policy trained in simulated indoor scenes to generalize to diverse real-world environments, both indoors and outdoors.
The data-driven approach to robot control has been gathering pace rapidly, yet generalization to unseen task domains remains a critical challenge. We argue that the key to generalization is representations that are (i) rich enough to capture all task-relevant information and (ii) invariant to superfluous variability between the training and the test domains. We experimentally study such a representation -- containing both depth and semantic information -- for visual navigation and show that it enables a control policy trained entirely in simulated indoor scenes to generalize to diverse real-world environments, both indoors and outdoors. Further, we show that our representation reduces the A-distance between the training and test domains, improving the generalization error bound as a result. Our proposed approach is scalable: the learned policy improves continuously, as the foundation models that it exploits absorb more diverse data during pre-training.