CVApr 15

Seedance 2.0: Advancing Video Generation for World Complexity

Georgia Tech
arXiv:2604.1414883.631 citationsh-index: 30
AI Analysis

For video generation practitioners, this is an incremental improvement over Seedance 1.0/1.5 Pro, offering a unified architecture and multi-modal capabilities but without surpassing existing SOTA.

Seedance 2.0 is a new multi-modal audio-video generation model that supports text, image, audio, and video inputs, achieving performance on par with leading models in expert and user evaluations. It generates 4-15 second audio-video content at 480p/720p resolution.

Seedance 2.0 is a new native multi-modal audio-video generation model, officially released in China in early February 2026. Compared with its predecessors, Seedance 1.0 and 1.5 Pro, Seedance 2.0 adopts a unified, highly efficient, and large-scale architecture for multi-modal audio-video joint generation. This allows it to support four input modalities: text, image, audio, and video, by integrating one of the most comprehensive suites of multi-modal content reference and editing capabilities available in the industry to date. It delivers substantial, well-rounded improvements across all key sub-dimensions of video and audio generation. In both expert evaluations and public user tests, the model has demonstrated performance on par with the leading levels in the field. Seedance 2.0 supports direct generation of audio-video content with durations ranging from 4 to 15 seconds, with native output resolutions of 480p and 720p. For multi-modal inputs as reference, its current open platform supports up to 3 video clips, 9 images, and 3 audio clips. In addition, we provide Seedance 2.0 Fast version, an accelerated variant of Seedance 2.0 designed to boost generation speed for low-latency scenarios. Seedance 2.0 has delivered significant improvements to its foundational generation capabilities and multi-modal generation performance, bringing an enhanced creative experience for end users.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes