CVMay 28

KGEdit: Ambiguity-Aware Knowledge Graphs for Training-Free Precise Video Generation and Editing

Mingshu Cai, Miao Zhang, Chenghe Yang, Yixuan Li, Osamu Yoshie, Yuya Ieiri

arXiv:2605.2950969.6h-index: 5

Predicted impact top 42% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the problem of semantic ambiguity and cross-frame inconsistency in text-to-video generation for users requiring precise control without retraining.

KGEdit introduces a training-free framework for text-to-video generation that uses ambiguity-aware knowledge graphs to resolve semantic ambiguity and improve concept binding, achieving superior editing precision and temporal stability compared to existing methods.

In recent years, training-free video generation has progressed remarkably. However, when handling complex textual instructions, existing methods still suffer from semantic ambiguity, incorrect concept binding, and cross-frame inconsistency. To address these issues, we propose KGEdit, a structured semantic control framework for text-to-video (T2V) diffusion models. Specifically, we first construct an ambiguity-aware knowledge graph (AAKG) to disentangle and disambiguate the input prompt, converting it into four types of structured semantics: identity, relation, attribute, and negative constraints. We then design a structured semantic injection module (SSIM) to inject these semantic signals into key layers of the diffusion Transformer, enabling fine-grained semantic control. In addition, we introduce a temporal-aware semantic control (TASC) module that dynamically schedules semantic objectives according to the stage-wise characteristics of the denoising process, further improving semantic alignment and temporal consistency. Experiments show that KGEdit outperforms existing methods in editing precision and temporal stability, while offering higher efficiency and controllability in text-driven interaction scenarios.

View on arXiv PDF

Similar