WorldScore: A Unified Evaluation Benchmark for World Generation
This provides a standardized evaluation framework for researchers and practitioners in AI and computer vision working on world generation, though it is incremental as it builds on existing scene and video generation methods.
The authors tackled the lack of a unified benchmark for world generation by introducing WorldScore, which decomposes the task into next-scene generation with camera trajectory-based layouts, and they evaluated 19 models to reveal key insights and challenges.
We introduce the WorldScore benchmark, the first unified benchmark for world generation. We decompose world generation into a sequence of next-scene generation tasks with explicit camera trajectory-based layout specifications, enabling unified evaluation of diverse approaches from 3D and 4D scene generation to video generation models. The WorldScore benchmark encompasses a curated dataset of 3,000 test examples that span diverse worlds: static and dynamic, indoor and outdoor, photorealistic and stylized. The WorldScore metrics evaluate generated worlds through three key aspects: controllability, quality, and dynamics. Through extensive evaluation of 19 representative models, including both open-source and closed-source ones, we reveal key insights and challenges for each category of models. Our dataset, evaluation code, and leaderboard can be found at https://haoyi-duan.github.io/WorldScore/