CVSep 23, 2024
ControlEdit: A MultiModal Local Clothing Image Editing MethodDi Cheng, YingJie Shi, ShiXin Sun et al.
Multimodal clothing image editing refers to the precise adjustment and modification of clothing images using data such as textual descriptions and visual images as control conditions, which effectively improves the work efficiency of designers and reduces the threshold for user design. In this paper, we propose a new image editing method ControlEdit, which transfers clothing image editing to multimodal-guided local inpainting of clothing images. We address the difficulty of collecting real image datasets by leveraging the self-supervised learning approach. Based on this learning approach, we extend the channels of the feature extraction network to ensure consistent clothing image style before and after editing, and we design an inverse latent loss function to achieve soft control over the content of non-edited areas. In addition, we adopt Blended Latent Diffusion as the sampling method to make the editing boundaries transition naturally and enforce consistency of non-edited area content. Extensive experiments demonstrate that ControlEdit surpasses baseline algorithms in both qualitative and quantitative evaluations.
84.0LOMay 5
A Fast Model Counting Algorithm for Two-Variable Logic with Counting and Modulo Counting QuantifiersShixin Sun, Astrid Klipfel, Ondřej Kuželka et al.
Weighted first-order model counting (WFOMC) is a central task in lifted probabilistic inference: It asks for the weighted sum of all models of a first-order sentence over a finite domain. A long line of work has identified domain-liftable fragments of first-order logic, that is, syntactic classes for which WFOMC can be solved in time polynomial in the domain size. Among them, the two-variable fragment with counting quantifiers, $\mathbf{C}^2$, is one of the most expressive known liftable fragments. Existing algorithms for $\mathbf{C}^2$, however, establish tractability through multi-stage reductions that eliminate counting quantifiers via cardinality constraints, which introduces substantial practical overhead as the domain size grows. In this paper, we introduce IncrementalWFOMC3, a lifted algorithm for WFOMC on $\mathbf{C}^2$ and its modulo counting extension, $\mathbf{C}^2_{\text{mod}}$. Instead of relying on reduction techniques, IncrementalWFOMC3 operates directly on a Scott normal form that retains counting quantifiers throughout inference. This direct treatment yields two main results. First, we derive a tighter data-complexity bound for WFOMC in $\mathbf{C}^2$, reducing the degree of the polynomial from quadratic to linear in the counting parameters. Second, we prove that $\mathbf{C}^2_{\text{mod}}$ is domain-liftable, extending tractability from $\mathbf{C}^2$ to a richer fragment with native modulo counting support. Finally, our empirical evaluation shows that IncrementalWFOMC3 delivers orders-of-magnitude runtime improvements and better scalability than both existing WFOMC algorithms and state-of-the-art propositional model counters.
CVAug 8, 2025
UGD-IML: A Unified Generative Diffusion-based Framework for Constrained and Unconstrained Image Manipulation LocalizationYachun Mi, Xingyang He, Shixin Sun et al.
In the digital age, advanced image editing tools pose a serious threat to the integrity of visual content, making image forgery detection and localization a key research focus. Most existing Image Manipulation Localization (IML) methods rely on discriminative learning and require large, high-quality annotated datasets. However, current datasets lack sufficient scale and diversity, limiting model performance in real-world scenarios. To overcome this, recent studies have explored Constrained IML (CIML), which generates pixel-level annotations through algorithmic supervision. However, existing CIML approaches often depend on complex multi-stage pipelines, making the annotation process inefficient. In this work, we propose a novel generative framework based on diffusion models, named UGD-IML, which for the first time unifies both IML and CIML tasks within a single framework. By learning the underlying data distribution, generative diffusion models inherently reduce the reliance on large-scale labeled datasets, allowing our approach to perform effectively even under limited data conditions. In addition, by leveraging a class embedding mechanism and a parameter-sharing design, our model seamlessly switches between IML and CIML modes without extra components or training overhead. Furthermore, the end-to-end design enables our model to avoid cumbersome steps in the data annotation process. Extensive experimental results on multiple datasets demonstrate that UGD-IML outperforms the SOTA methods by an average of 9.66 and 4.36 in terms of F1 metrics for IML and CIML tasks, respectively. Moreover, the proposed method also excels in uncertainty estimation, visualization and robustness.