CVJan 28

TeleStyle: Content-Preserving Style Transfer in Images and Videos

arXiv:2601.20175v11 citationsh-index: 5Has Code
AI Analysis

This work addresses style transfer for users needing high-fidelity content preservation, but it is incremental as it builds on existing models like Qwen-Image-Edit.

The paper tackles the challenge of content-preserving style transfer in images and videos for Diffusion Transformers by introducing TeleStyle, a lightweight model that achieves state-of-the-art performance in style similarity, content consistency, and aesthetic quality.

Content-preserving style transfer, generating stylized outputs based on content and style references, remains a significant challenge for Diffusion Transformers (DiTs) due to the inherent entanglement of content and style features in their internal representations. In this technical report, we present TeleStyle, a lightweight yet effective model for both image and video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base model's robust capabilities in content preservation and style customization. To facilitate effective training, we curated a high-quality dataset of distinct specific styles and further synthesized triplets using thousands of diverse, in-the-wild style categories. We introduce a Curriculum Continual Learning framework to train TeleStyle on this hybrid dataset of clean (curated) and noisy (synthetic) triplets. This approach enables the model to generalize to unseen styles without compromising precise content fidelity. Additionally, we introduce a video-to-video stylization module to enhance temporal consistency and visual quality. TeleStyle achieves state-of-the-art performance across three core evaluation metrics: style similarity, content consistency, and aesthetic quality. Code and pre-trained models are available at https://github.com/Tele-AI/TeleStyle

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes