CVAICLApr 11, 2024

S3Editor: A Sparse Semantic-Disentangled Self-Training Framework for Face Video Editing

arXiv:2404.08111v11 citationsh-index: 11
Originality Highly original
AI Analysis

This work addresses face video editing for applications requiring high-quality, consistent results, representing an incremental advancement with novel architectural and optimization components.

The paper tackles the problem of face video editing by proposing S3Editor, a framework that addresses challenges in identity preservation, editing faithfulness, and temporal consistency through self-training, semantic disentanglement, and structured sparse optimization, achieving significant improvements in these metrics.

Face attribute editing plays a pivotal role in various applications. However, existing methods encounter challenges in achieving high-quality results while preserving identity, editing faithfulness, and temporal consistency. These challenges are rooted in issues related to the training pipeline, including limited supervision, architecture design, and optimization strategy. In this work, we introduce S3Editor, a Sparse Semantic-disentangled Self-training framework for face video editing. S3Editor is a generic solution that comprehensively addresses these challenges with three key contributions. Firstly, S3Editor adopts a self-training paradigm to enhance the training process through semi-supervision. Secondly, we propose a semantic disentangled architecture with a dynamic routing mechanism that accommodates diverse editing requirements. Thirdly, we present a structured sparse optimization schema that identifies and deactivates malicious neurons to further disentangle impacts from untarget attributes. S3Editor is model-agnostic and compatible with various editing approaches. Our extensive qualitative and quantitative results affirm that our approach significantly enhances identity preservation, editing fidelity, as well as temporal consistency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes