ROMar 21

Swim2Real: VLM-Guided System Identification for Sim-to-Real Transfer

arXiv:2603.2082744.5h-index: 22
Predicted impact top 64% in RO · last 90 daysOriginality Incremental advance
AI Analysis

This addresses the sim-to-real gap for aquatic robots, enabling zero-shot RL transfer from video without manual system identification, though it appears incremental as it builds on prior multi-stage calibration approaches.

The researchers tackled the problem of calibrating robotic fish simulators from swimming videos by developing Swim2Real, a pipeline that uses vision-language model feedback to calibrate 16 parameters simultaneously without manual stages. The result was a 43% lower velocity error (MAE = 7.4 mm/s) compared to other methods and downstream RL policies that swam 12-90% farther.

We present Swim2Real, a pipeline that calibrates a 16-parameter robotic fish simulator from swimming videos using vision-language model (VLM) feedback, requiring no hand-designed search stages. Calibrating soft aquatic robots is particularly challenging because nonlinear fluid-structure coupling makes the parameter landscape chaotic, simplified fluid models introduce a persistent sim-to-real gap, and controlled aquatic experiments are difficult to reproduce. Prior work on this platform required three manually tailored stages to handle this complexity. The VLM compares simulated and real videos and proposes parameter updates. A backtracking line search then validates each step size, tripling the accept rate from 14% to 42% by recovering proposals where the direction is correct but the magnitude is too large. Swim2Real calibrates all 16 parameters simultaneously, most closely matching real fish velocities across all motor frequencies (MAE = 7.4 mm/s, 43% lower than the next-best method), with zero outlier seeds across five runs. Motor commands from the trained policy transfer to the physical fish at 50 Hz, completing the pipeline from swimming video to real-world deployment. Downstream RL policies swim 12% farther than those from BayesOpt-calibrated simulators and 90% farther than CMA-ES. These results demonstrate that VLM-guided calibration can close the sim-to-real gap for aquatic robots directly from video, enabling zero-shot RL transfer to physical swimmers without manual system identification, a step toward automated, general-purpose simulator tuning for underwater robotics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes