CVApr 18, 2025

DanceText: A Training-Free Layered Framework for Controllable Multilingual Text Transformation in Images

arXiv:2504.14108v21 citationsh-index: 7Has Code
Originality Incremental advance
AI Analysis

This addresses the need for more controllable and layout-preserving text editing in images for applications like design and content creation, though it appears incremental as it builds on existing diffusion-based methods.

The paper tackles the problem of controllable multilingual text editing in images, particularly for complex geometric transformations like rotation and warping, by introducing DanceText, a training-free layered framework that achieves superior visual quality on the AnyWord-3M benchmark.

We present DanceText, a training-free framework for multilingual text editing in images, designed to support complex geometric transformations and achieve seamless foreground-background integration. While diffusion-based generative models have shown promise in text-guided image synthesis, they often lack controllability and fail to preserve layout consistency under non-trivial manipulations such as rotation, translation, scaling, and warping. To address these limitations, DanceText introduces a layered editing strategy that separates text from the background, allowing geometric transformations to be performed in a modular and controllable manner. A depth-aware module is further proposed to align appearance and perspective between the transformed text and the reconstructed background, enhancing photorealism and spatial consistency. Importantly, DanceText adopts a fully training-free design by integrating pretrained modules, allowing flexible deployment without task-specific fine-tuning. Extensive experiments on the AnyWord-3M benchmark demonstrate that our method achieves superior performance in visual quality, especially under large-scale and complex transformation scenarios. Code is avaible at https://github.com/YuZhenyuLindy/DanceText.git.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes