CVNov 3, 2025

NSYNC: Negative Synthetic Image Generation for Contrastive Training to Improve Stylized Text-To-Image Translation

arXiv:2511.01517v1h-index: 26Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of generating stylized images from text for creative applications, representing an incremental advance in model training techniques.

The paper tackles the problem of text-to-image generation models failing to capture specific artistic styles by proposing NSYNC, a contrastive learning framework that uses negative synthetic images to improve stylization, showing quantitative and qualitative improvements over baselines.

Current text conditioned image generation methods output realistic looking images, but they fail to capture specific styles. Simply finetuning them on the target style datasets still struggles to grasp the style features. In this work, we present a novel contrastive learning framework to improve the stylization capability of large text-to-image diffusion models. Motivated by the astonishing advance in image generation models that makes synthetic data an intrinsic part of model training in various computer vision tasks, we exploit synthetic image generation in our approach. Usually, the generated synthetic data is dependent on the task, and most of the time it is used to enlarge the available real training dataset. With NSYNC, alternatively, we focus on generating negative synthetic sets to be used in a novel contrastive training scheme along with real positive images. In our proposed training setup, we forward negative data along with positive data and obtain negative and positive gradients, respectively. We then refine the positive gradient by subtracting its projection onto the negative gradient to get the orthogonal component, based on which the parameters are updated. This orthogonal component eliminates the trivial attributes that are present in both positive and negative data and directs the model towards capturing a more unique style. Experiments on various styles of painters and illustrators show that our approach improves the performance over the baseline methods both quantitatively and qualitatively. Our code is available at https://github.com/giddyyupp/NSYNC.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes