CVAIOct 13, 2025

Proportion and Perspective Control for Flow-Based Image Generation

arXiv:2510.21763v1Has Code
Originality Incremental advance
AI Analysis

This provides artists and designers with enhanced control over image generation, though it is an incremental improvement over existing methods.

The paper tackles the limited spatial and geometric control in text-to-image diffusion models by introducing two specialized ControlNets for proportion and perspective control, demonstrating effective control but with limitations in complex constraints.

While modern text-to-image diffusion models generate high-fidelity images, they offer limited control over the spatial and geometric structure of the output. To address this, we introduce and evaluate two ControlNets specialized for artistic control: (1) a proportion ControlNet that uses bounding boxes to dictate the position and scale of objects, and (2) a perspective ControlNet that employs vanishing lines to control the 3D geometry of the scene. We support the training of these modules with data pipelines that leverage vision-language models for annotation and specialized algorithms for conditioning image synthesis. Our experiments demonstrate that both modules provide effective control but exhibit limitations with complex constraints. Both models are released on HuggingFace: https://huggingface.co/obvious-research

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes