WonderJourney: Going from Anywhere to Everywhere
This addresses the need for automated, scalable 3D content creation in fields like virtual reality and gaming, though it appears incremental as it builds on existing methods for text-to-3D generation.
The paper tackles the problem of generating perpetual, diverse 3D scenes from any starting point by introducing WonderJourney, a modular framework that uses an LLM for text descriptions, a pipeline for 3D point cloud generation, and a VLM for verification, resulting in coherent sequences of scenes across various types and styles.
We introduce WonderJourney, a modularized framework for perpetual 3D scene generation. Unlike prior work on view generation that focuses on a single type of scenes, we start at any user-provided location (by a text description or an image) and generate a journey through a long sequence of diverse yet coherently connected 3D scenes. We leverage an LLM to generate textual descriptions of the scenes in this journey, a text-driven point cloud generation pipeline to make a compelling and coherent sequence of 3D scenes, and a large VLM to verify the generated scenes. We show compelling, diverse visual results across various scene types and styles, forming imaginary "wonderjourneys". Project website: https://kovenyu.com/WonderJourney/