OrienText: Surface Oriented Textual Image Generation
This addresses a specific challenge in e-commerce and advertising for generating marketing images with precise text placement, but it appears incremental as it builds on existing diffusion models.
The paper tackles the problem of generating images with text accurately placed on complex surfaces like buildings or banners, which current text-to-image models struggle with, by introducing the OrienText method that uses surface normals as conditional input, and demonstrates its effectiveness on a self-curated dataset.
Textual content in images is crucial in e-commerce sectors, particularly in marketing campaigns, product imaging, advertising, and the entertainment industry. Current text-to-image (T2I) generation diffusion models, though proficient at producing high-quality images, often struggle to incorporate text accurately onto complex surfaces with varied perspectives, such as angled views of architectural elements like buildings, banners, or walls. In this paper, we introduce the Surface Oriented Textual Image Generation (OrienText) method, which leverages region-specific surface normals as conditional input to T2I generation diffusion model. Our approach ensures accurate rendering and correct orientation of the text within the image context. We demonstrate the effectiveness of the OrienText method on a self-curated dataset of images and compare it against the existing textual image generation methods.