Physically Interpretable World Models via Weakly Supervised Representation Learning
This addresses the need for reliable and generalizable world models in cyber-physical systems, particularly for safety-critical tasks, by enhancing physical interpretability without ground-truth annotations.
The paper tackled the problem of learning predictive models from high-dimensional sensory observations by introducing Physically Interpretable World Models (PIWM), which align latent representations with physical quantities and constrain their evolution through known dynamics, achieving accurate long-horizon prediction and recovering true system parameters in case studies like Cart Pole, Lunar Lander, and Donkey Car.
Learning predictive models from high-dimensional sensory observations is fundamental for cyber-physical systems, yet the latent representations learned by standard world models lack physical interpretability. This limits their reliability, generalizability, and applicability to safety-critical tasks. We introduce Physically Interpretable World Models (PIWM), a framework that aligns latent representations with real-world physical quantities and constrains their evolution through partially known physical dynamics. Physical interpretability in PIWM is defined by two complementary properties: (i) the learned latent state corresponds to meaningful physical variables, and (ii) its temporal evolution follows physically consistent dynamics. To achieve this without requiring ground-truth physical annotations, PIWM employs weak distribution-based supervision that captures state uncertainty naturally arising from real-world sensing pipelines. The architecture integrates a VQ-based visual encoder, a transformer-based physical encoder, and a learnable dynamics model grounded in known physical equations. Across three case studies (Cart Pole, Lunar Lander, and Donkey Car), PIWM achieves accurate long-horizon prediction, recovers true system parameters, and significantly improves physical grounding over purely data-driven models. These results demonstrate the feasibility and advantages of learning physically interpretable world models directly from images under weak supervision.