Chaining text-to-image and large language model: A novel approach for generating personalized e-commerce banners
This addresses the time-consuming and scalability issues in e-commerce banner creation, though it is incremental as it combines existing models.
The paper tackles the problem of manually generating e-commerce banners by proposing a method that uses a large language model to convert user interaction data into prompts for a text-to-image model, resulting in high-quality personalized banners.
Text-to-image models such as stable diffusion have opened a plethora of opportunities for generating art. Recent literature has surveyed the use of text-to-image models for enhancing the work of many creative artists. Many e-commerce platforms employ a manual process to generate the banners, which is time-consuming and has limitations of scalability. In this work, we demonstrate the use of text-to-image models for generating personalized web banners with dynamic content for online shoppers based on their interactions. The novelty in this approach lies in converting users' interaction data to meaningful prompts without human intervention. To this end, we utilize a large language model (LLM) to systematically extract a tuple of attributes from item meta-information. The attributes are then passed to a text-to-image model via prompt engineering to generate images for the banner. Our results show that the proposed approach can create high-quality personalized banners for users.