CVMay 30

OptiWorld: Optimal Control for Video World Generation under Physical Constraints

arXiv:2606.0049967.2h-index: 1
AI Analysis

For video generation and world modeling, this work addresses the lack of controllable, physically consistent dynamics by combining optimal control with generative models.

OptiWorld integrates optimal control into video generation at inference time to produce physically consistent and task-optimal dynamics. It extracts compact world states, plans optimal trajectories under constraints, and renders videos conditioned on those trajectories, achieving improved dynamics in goal-conditioned generation, editing, and counterfactual tasks.

Video generation models are becoming a scalable form of world models, but they mainly generate plausible motion rather than proactively control or optimize the underlying dynamics. As a result, an object in the generated video may follow trajectories that are unsafe, not smooth, inefficient, or physically inconsistent. In this work, we propose \textbf{OptiWorld}, a framework that brings classical optimal control into video generation at inference time. OptiWorld first extracts a compact, task-relevant world state, then plans an optimal trajectory under physical constraints, and finally renders the video conditioned on this trajectory. We formulate planning as a geometric problem on a continuous manifold, which converts 3D geometry and task-dependent physical constraints into a unified planning geometry. By adding this optimal-control layer, OptiWorld generates videos with preferable dynamics, demonstrating strong potential in multiple tasks including goal-conditioned image-to-video generation, video dynamics editing, and counterfactual generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes