CVDec 18, 2025

Kling-Omni Technical Report

arXiv:2512.16776v127 citationsh-index: 16
Originality Incremental advance
AI Analysis

This work addresses the need for unified video creation tools for content creators and researchers, representing an incremental advancement by integrating existing tasks into a single framework.

The paper tackles the problem of synthesizing high-fidelity videos from multimodal inputs by introducing Kling-Omni, a generalist generative framework that integrates video generation, editing, and reasoning tasks into a holistic system, achieving exceptional capabilities in in-context generation, reasoning-based editing, and multimodal instruction following.

We present Kling-Omni, a generalist generative framework designed to synthesize high-fidelity videos directly from multimodal visual language inputs. Adopting an end-to-end perspective, Kling-Omni bridges the functional separation among diverse video generation, editing, and intelligent reasoning tasks, integrating them into a holistic system. Unlike disjointed pipeline approaches, Kling-Omni supports a diverse range of user inputs, including text instructions, reference images, and video contexts, processing them into a unified multimodal representation to deliver cinematic-quality and highly-intelligent video content creation. To support these capabilities, we constructed a comprehensive data system that serves as the foundation for multimodal video creation. The framework is further empowered by efficient large-scale pre-training strategies and infrastructure optimizations for inference. Comprehensive evaluations reveal that Kling-Omni demonstrates exceptional capabilities in in-context generation, reasoning-based editing, and multimodal instruction following. Moving beyond a content creation tool, we believe Kling-Omni is a pivotal advancement toward multimodal world simulators capable of perceiving, reasoning, generating and interacting with the dynamic and complex worlds.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes