Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
This work reduces the financial barrier for advanced video generation, potentially democratizing access for content creators and researchers, though it is incremental in optimizing existing methods rather than introducing a new paradigm.
The authors tackled the high cost of training top-performing video generation models by developing Open-Sora 2.0, a commercial-level model trained for only $200k, which achieves comparable performance to leading models like HunyuanVideo and Runway Gen-3 Alpha in human evaluations and VBench scores.
Video generation models have achieved remarkable progress in the past year. The quality of AI video continues to improve, but at the cost of larger model size, increased data quantity, and greater demand for training compute. In this report, we present Open-Sora 2.0, a commercial-level video generation model trained for only $200k. With this model, we demonstrate that the cost of training a top-performing video generation model is highly controllable. We detail all techniques that contribute to this efficiency breakthrough, including data curation, model architecture, training strategy, and system optimization. According to human evaluation results and VBench scores, Open-Sora 2.0 is comparable to global leading video generation models including the open-source HunyuanVideo and the closed-source Runway Gen-3 Alpha. By making Open-Sora 2.0 fully open-source, we aim to democratize access to advanced video generation technology, fostering broader innovation and creativity in content creation. All resources are publicly available at: https://github.com/hpcaitech/Open-Sora.