FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization
This addresses the need for realistic and controllable garment visualization in fashion e-commerce, offering a practical solution for personalized virtual fashion display.
The paper tackles the problem of generating personalized fashion images with diverse poses and lighting conditions from text descriptions, introducing FashionPose as a unified framework that achieves fine-grained pose synthesis and consistent relighting.
Realistic and controllable garment visualization is critical for fashion e-commerce, where users expect personalized previews under diverse poses and lighting conditions. Existing methods often rely on predefined poses, limiting semantic flexibility and illumination adaptability. To address this, we introduce FashionPose, the first unified text-to-pose-to-relighting generation framework. Given a natural language description, our method first predicts a 2D human pose, then employs a diffusion model to generate high-fidelity person images, and finally applies a lightweight relighting module, all guided by the same textual input. By replacing explicit pose annotations with text-driven conditioning, FashionPose enables accurate pose alignment, faithful garment rendering, and flexible lighting control. Experiments demonstrate fine-grained pose synthesis and efficient, consistent relighting, providing a practical solution for personalized virtual fashion display.