IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments
This benchmark highlights a critical gap in AI's ability to understand intuitive physics, which is essential for developing more human-like AI systems.
The authors introduced IntPhys 2, a video benchmark to evaluate deep learning models' intuitive physics understanding based on four core principles, and found that state-of-the-art models perform at chance levels (50%) while humans achieve near-perfect accuracy.
We present IntPhys 2, a video benchmark designed to evaluate the intuitive physics understanding of deep learning models. Building on the original IntPhys benchmark, IntPhys 2 focuses on four core principles related to macroscopic objects: Permanence, Immutability, Spatio-Temporal Continuity, and Solidity. These conditions are inspired by research into intuitive physical understanding emerging during early childhood. IntPhys 2 offers a comprehensive suite of tests, based on the violation of expectation framework, that challenge models to differentiate between possible and impossible events within controlled and diverse virtual environments. Alongside the benchmark, we provide performance evaluations of several state-of-the-art models. Our findings indicate that while these models demonstrate basic visual understanding, they face significant challenges in grasping intuitive physics across the four principles in complex scenes, with most models performing at chance levels (50%), in stark contrast to human performance, which achieves near-perfect accuracy. This underscores the gap between current models and human-like intuitive physics understanding, highlighting the need for advancements in model architectures and training methodologies.