Cross Attention Based Style Distribution for Controllable Person Image Synthesis
This work addresses controllable image generation for applications like virtual try-on, but it appears incremental as it builds on existing pose transfer methods.
The paper tackles controllable person image synthesis by proposing a cross-attention style distribution module that selects and routes source semantic styles to target poses, validated on pose transfer and virtual try-on tasks with quantitative and qualitative results.
Controllable person image synthesis task enables a wide range of applications through explicit control over body pose and appearance. In this paper, we propose a cross attention based style distribution module that computes between the source semantic styles and target pose for pose transfer. The module intentionally selects the style represented by each semantic and distributes them according to the target pose. The attention matrix in cross attention expresses the dynamic similarities between the target pose and the source styles for all semantics. Therefore, it can be utilized to route the color and texture from the source image, and is further constrained by the target parsing map to achieve a clearer objective. At the same time, to encode the source appearance accurately, the self attention among different semantic styles is also added. The effectiveness of our model is validated quantitatively and qualitatively on pose transfer and virtual try-on tasks.