SEM-CS: Semantic CLIPStyler for Text-Based Image Style Transfer
This addresses content mismatch and over-stylization issues for users of text-based image editing tools, though it is incremental over CLIPStyler.
The paper tackled the problem of semantic loss in text-based image style transfer by proposing Semantic CLIPStyler, which segments content images and applies style based on text descriptions, resulting in improved performance as shown by DISTS, NIMA, and user study scores.
CLIPStyler demonstrated image style transfer with realistic textures using only the style text description (instead of requiring a reference style image). However, the ground semantics of objects in style transfer output is lost due to style spillover on salient and background objects (content mismatch) or over-stylization. To solve this, we propose Semantic CLIPStyler (Sem-CS) that performs semantic style transfer. Sem-CS first segments the content image into salient and non-salient objects and then transfers artistic style based on a given style text description. The semantic style transfer is achieved using global foreground loss (for salient objects) and global background loss (for non-salient objects). Our empirical results, including DISTS, NIMA and user study scores, show that our proposed framework yields superior qualitative and quantitative performance.