Utility-Aware Multimodal Contrastive Learning for Product Image Generation
For e-commerce platforms and sellers, this work addresses the gap between image generation and commercial effectiveness by directly optimizing for demand, though it is an incremental extension of contrastive learning with a new loss function.
Existing generative AI models for product images optimize semantic alignment but not marketplace performance. The authors propose a utility-aware multimodal contrastive learning framework with a novel Utility-Aware InfoNCE loss that incorporates consumer demand, achieving up to 15% improvement in demand-based metrics on Amazon and Airbnb while preserving fidelity and text-image consistency.
Product images strongly influence consumer decision-making in online marketplaces. Empowered by multimodal contrastive learning, generative AI can output images that closely align with text prompts. Yet existing generative AI models do not directly optimize marketplace performance. This is a critical gap, since semantic alignment alone does not guarantee that an image will sell. To address this limitation, we propose a \textit{utility-aware multimodal contrastive learning} framework that incorporates consumer demand into a novel Utility-Aware InfoNCE loss. Optimizing this utility-aware objective guides generation toward images that are both semantically coherent and demand-enhancing. This effect arises directly from a shift in the learned image-text representation space toward demand-driven visual cues, which we also validate through the theoretical bound of the proposed objective. In downstream applications on Amazon and Airbnb, product images generated and edited by our method outperform state-of-the-art models in increasing demand and preserving fidelity, while maintaining text-image consistency. Notably, our utility-aware framework preserves inverse U-shaped demand patterns for attributes such as aesthetics and uniqueness, improving demand-based performance while preserving fidelity and semantic consistency. Human-subject experiments further validate its commercial effectiveness. As generative AI technology continues to evolve, our utility-aware component can be flexibly embedded into emerging generative models to improve direct commercial use.