GaussFly: Contrastive Reinforcement Learning for Visuomotor Policies in 3D Gaussian Fields
This work addresses the sim-to-real gap problem for autonomous aerial vehicles, offering a robust solution for real-world deployment, though it is incremental as it builds on existing methods like 3DGS and contrastive learning.
The paper tackles the challenge of learning visuomotor policies for autonomous aerial vehicles using monocular vision, which suffers from low sample efficiency and sim-to-real gaps, by proposing GaussFly, a framework that decouples representation learning from policy optimization using 3D Gaussian Splatting and contrastive learning, achieving superior sample efficiency and enabling zero-shot transfer to unseen real-world environments.
Learning visuomotor policies for Autonomous Aerial Vehicles (AAVs) relying solely on monocular vision is an attractive yet highly challenging paradigm. Existing end-to-end learning approaches directly map high-dimensional RGB observations to action commands, which frequently suffer from low sample efficiency and severe sim-to-real gaps due to the visual discrepancy between simulation and physical domains. To address these long-standing challenges, we propose GaussFly, a novel framework that explicitly decouples representation learning from policy optimization through a cohesive real-to-sim-to-real paradigm. First, to achieve a high-fidelity real-to-sim transition, we reconstruct training scenes using 3D Gaussian Splatting (3DGS) augmented with explicit geometric constraints. Second, to ensure robust sim-to-real transfer, we leverage these photorealistic simulated environments and employ contrastive representation learning to extract compact, noise-resilient latent features from the rendered RGB images. By utilizing this pre-trained encoder to provide low-dimensional feature inputs, the computational burden on the visuomotor policy is significantly reduced while its resistance against visual noise is inherently enhanced. Extensive experiments in simulated and real-world environments demonstrate that GaussFly achieves superior sample efficiency and asymptotic performance compared to baselines. Crucially, it enables robust and zero-shot policy transfer to unseen real-world environments with complex textures, effectively bridging the sim-to-real gap.