Advancing Open-source World Models
This work provides an open-source tool for applications in content creation, gaming, and robot learning, though it appears incremental as it builds on existing video generation methods.
The authors tackled the problem of creating a high-fidelity, open-source world model for diverse environments, achieving a latency under 1 second for real-time interactivity at 16 frames per second.
We present LingBot-World, an open-sourced world simulator stemming from video generation. Positioned as a top-tier world model, LingBot-World offers the following features. (1) It maintains high fidelity and robust dynamics in a broad spectrum of environments, including realism, scientific contexts, cartoon styles, and beyond. (2) It enables a minute-level horizon while preserving contextual consistency over time, which is also known as "long-term memory". (3) It supports real-time interactivity, achieving a latency of under 1 second when producing 16 frames per second. We provide public access to the code and model in an effort to narrow the divide between open-source and closed-source technologies. We believe our release will empower the community with practical applications across areas like content creation, gaming, and robot learning.