IPR-1: Interactive Physical Reasoner
This addresses the challenge of improving physical reasoning in AI agents for interactive environments, though it is incremental as it builds on existing VLM and world model methods.
The paper tackles the problem of enabling agents to acquire human-like physical reasoning through interaction, by proposing IPR (Interactive Physical Reasoner) that combines world-model rollouts with a VLM's policy and a physics-centric action code. The result shows that IPR performs robustly on three reasoning levels, matches GPT-5 overall, surpasses it on Curiosity, and improves with more training and interaction, with zero-shot transfer to unseen games.
Humans learn by observing, interacting with environments, and internalizing physics and causality. Here, we aim to ask whether an agent can similarly acquire human-like reasoning from interaction and keep improving with more experience. We study this in a Game-to-Unseen (G2U) setting, curating 1,000+ heterogeneous games with diverse physical and causal mechanisms, and evaluate at three human-like levels: Survival, Curiosity, Utility, from primitive intuition to goal-driven reasoning. Our analysis reveals complementary failures: VLM/VLA agents reason but lack look-ahead in interactive settings, while world models imagine but imitate visual patterns rather than analyze physics and causality. We therefore propose IPR (Interactive Physical Reasoner), using world-model rollouts to score and reinforce a VLM's policy, and introduce PhysCode, a physics-centric action code aligning semantic intent with dynamics to provide a shared action space for prediction and reasoning. Pretrained on 1,000+ games, our IPR performs robustly on three levels, matches GPT-5 overall, and surpasses it on Curiosity. We find that performance improves with more training games and interaction steps, and that the model also zero-shot transfers to unseen games. These results support physics-centric interaction as a path to steadily improving physical reasoning.