Alexey Kirillov

CV
h-index15
3papers
9citations
Novelty50%
AI Score32

3 Papers

CVApr 8, 2024
YaART: Yet Another ART Rendering Technology

Sergey Kastryulin, Artem Konev, Alexander Shishenya et al.

In the rapidly progressing field of generative models, the development of efficient and high-fidelity text-to-image diffusion systems represents a significant frontier. This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences using Reinforcement Learning from Human Feedback (RLHF). During the development of YaART, we especially focus on the choices of the model and training dataset sizes, the aspects that were not systematically investigated for text-to-image cascaded diffusion models before. In particular, we comprehensively analyze how these choices affect both the efficiency of the training process and the quality of the generated images, which are highly important in practice. Furthermore, we demonstrate that models trained on smaller datasets of higher-quality images can successfully compete with those trained on larger datasets, establishing a more efficient scenario of diffusion models training. From the quality perspective, YaART is consistently preferred by users over many existing state-of-the-art models.

CVMay 25, 2025
Alchemist: Turning Public Text-to-Image Data into Generative Gold

Valerii Startsev, Alexander Ustyuzhanin, Alexey Kirillov et al.

Pre-training equips text-to-image (T2I) models with broad world knowledge, but this alone is often insufficient to achieve high aesthetic quality and alignment. Consequently, supervised fine-tuning (SFT) is crucial for further refinement. However, its effectiveness highly depends on the quality of the fine-tuning dataset. Existing public SFT datasets frequently target narrow domains (e.g., anime or specific art styles), and the creation of high-quality, general-purpose SFT datasets remains a significant challenge. Current curation methods are often costly and struggle to identify truly impactful samples. This challenge is further complicated by the scarcity of public general-purpose datasets, as leading models often rely on large, proprietary, and poorly documented internal data, hindering broader research progress. This paper introduces a novel methodology for creating general-purpose SFT datasets by leveraging a pre-trained generative model as an estimator of high-impact training samples. We apply this methodology to construct and release Alchemist, a compact (3,350 samples) yet highly effective SFT dataset. Experiments demonstrate that Alchemist substantially improves the generative quality of five public T2I models while preserving diversity and style. Additionally, we release the fine-tuned models' weights to the public.

CVDec 2, 2024
IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative Models

Khaled Abud, Sergey Lavrushkin, Alexey Kirillov et al.

Diffusion-based models have recently revolutionized image generation, achieving unprecedented levels of fidelity. However, consistent generation of high-quality images remains challenging partly due to the lack of conditioning mechanisms for perceptual quality. In this work, we propose methods to integrate image quality assessment (IQA) models into diffusion-based generators, enabling quality-aware image generation. We show that diffusion models can learn complex qualitative relationships from both IQA models' outputs and internal activations. First, we experiment with gradient-based guidance to optimize image quality directly and show this method has limited generalizability. To address this, we introduce IQA-Adapter, a novel framework that conditions generation on target quality levels by learning the implicit relationship between images and quality scores. When conditioned on high target quality, IQA-Adapter can shift the distribution of generated images towards a higher-quality subdomain, and, inversely, it can be used as a degradation model, generating progressively more distorted images when provided with a lower-quality signal. Under high-quality condition, IQA-Adapter achieves up to a 10% improvement across multiple objective metrics, as confirmed by a user preference study, while preserving generative diversity and content. Furthermore, we extend IQA-Adapter to a reference-based conditioning scenario, utilizing the rich activation space of IQA models to transfer highly specific, content-agnostic qualitative features between images.