Automatic Text Box Placement for Supporting Typographic Design
This addresses layout design efficiency for designers and advertisers, but it is incremental as it compares existing methods on a specific dataset.
The study tackled automated text box placement in incomplete layouts for advertisements and web pages, finding that standard Transformer-based models generally outperformed Vision and Language Model approaches, with specific challenges noted for small text or dense layouts.
In layout design for advertisements and web pages, balancing visual appeal and communication efficiency is crucial. This study examines automated text box placement in incomplete layouts, comparing a standard Transformer-based method, a small Vision and Language Model (Phi3.5-vision), a large pretrained VLM (Gemini), and an extended Transformer that processes multiple images. Evaluations on the Crello dataset show the standard Transformer-based models generally outperform VLM-based approaches, particularly when incorporating richer appearance information. However, all methods face challenges with very small text or densely populated layouts. These findings highlight the benefits of task-specific architectures and suggest avenues for further improvement in automated layout design.