CVAILGJan 22, 2025

PreciseCam: Precise Camera Control for Text-to-Image Generation

arXiv:2501.12910v115 citationsh-index: 22CVPR
Originality Highly original
AI Analysis

This addresses the lack of precise camera control in text-to-image models for artists and creators, offering a more efficient and general solution.

The paper tackles the problem of precise camera control in text-to-image generation by proposing a method that uses four simple camera parameters, eliminating the need for predefined shots, geometry, or multi-view data. The result is a system that surpasses traditional prompt engineering approaches, demonstrated with a novel dataset of over 57,000 images.

Images as an artistic medium often rely on specific camera angles and lens distortions to convey ideas or emotions; however, such precise control is missing in current text-to-image models. We propose an efficient and general solution that allows precise control over the camera when generating both photographic and artistic images. Unlike prior methods that rely on predefined shots, we rely solely on four simple extrinsic and intrinsic camera parameters, removing the need for pre-existing geometry, reference 3D objects, and multi-view data. We also present a novel dataset with more than 57,000 images, along with their text prompts and ground-truth camera parameters. Our evaluation shows precise camera control in text-to-image generation, surpassing traditional prompt engineering approaches. Our data, model, and code are publicly available at https://graphics.unizar.es/projects/PreciseCam2024.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes