Can Image-To-Video Models Simulate Pedestrian Dynamics?
This work addresses the problem of simulating pedestrian dynamics for applications like urban planning or autonomous systems, but it appears incremental as it applies existing models to a new domain without claiming major breakthroughs.
The paper investigates whether image-to-video models can generate realistic pedestrian movement patterns in crowded scenes by conditioning on keyframes from trajectory benchmarks, and evaluates their performance using quantitative measures of pedestrian dynamics.
Recent high-performing image-to-video (I2V) models based on variants of the diffusion transformer (DiT) have displayed remarkable inherent world-modeling capabilities by virtue of training on large scale video datasets. We investigate whether these models can generate realistic pedestrian movement patterns in crowded public scenes. Our framework conditions I2V models on keyframes extracted from pedestrian trajectory benchmarks, then evaluates their trajectory prediction performance using quantitative measures of pedestrian dynamics.