ROCVLGSYDec 20, 2024

SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum

arXiv:2412.16346v219 citationsh-index: 11IEEE Robot Autom Lett
Originality Highly original
AI Analysis

This work addresses robust, end-to-end visual navigation for drones, enabling deployment in dynamic real-world conditions without fine-tuning.

The authors tackled visual drone navigation by developing SOUS VIDE, a simulator and policy architecture that achieves zero-shot sim-to-real transfer, with policies robust to mass variations, wind gusts, and environmental changes in 105 hardware experiments.

We propose a new simulator, training approach, and policy architecture, collectively called SOUS VIDE, for end-to-end visual drone navigation. Our trained policies exhibit zero-shot sim-to-real transfer with robust real-world performance using only onboard perception and computation. Our simulator, called FiGS, couples a computationally simple drone dynamics model with a high visual fidelity Gaussian Splatting scene reconstruction. FiGS can quickly simulate drone flights producing photorealistic images at up to 130 fps. We use FiGS to collect 100k-300k image/state-action pairs from an expert MPC with privileged state and dynamics information, randomized over dynamics parameters and spatial disturbances. We then distill this expert MPC into an end-to-end visuomotor policy with a lightweight neural architecture, called SV-Net. SV-Net processes color image, optical flow and IMU data streams into low-level thrust and body rate commands at 20 Hz onboard a drone. Crucially, SV-Net includes a learned module for low-level control that adapts at runtime to variations in drone dynamics. In a campaign of 105 hardware experiments, we show SOUS VIDE policies to be robust to 30% mass variations, 40 m/s wind gusts, 60% changes in ambient brightness, shifting or removing objects from the scene, and people moving aggressively through the drone's visual field. Code, data, and experiment videos can be found on our project page: https://stanfordmsl.github.io/SousVide/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes