Ye Zhu

h-index9

4papers

27citations

Novelty53%

AI Score37

Ranked #92,530 of 194,257 authors (top 48%)#31,088 in CV (top 53%)

4 Papers

2.0CVSep 27, 2024Code

Reducing Semantic Ambiguity In Domain Adaptive Semantic Segmentation Via Probabilistic Prototypical Pixel Contrast

Xiaoke Hao, Shiyu Liu, Chuanbo Feng et al.

Domain adaptation aims to reduce the model degradation on the target domain caused by the domain shift between the source and target domains. Although encouraging performance has been achieved by combining cognitive learning with the self-training paradigm, they suffer from ambiguous scenarios caused by scale, illumination, or overlapping when deploying deterministic embedding. To address these issues, we propose probabilistic proto-typical pixel contrast (PPPC), a universal adaptation framework that models each pixel embedding as a probability via multivariate Gaussian distribution to fully exploit the uncertainty within them, eventually improving the representation quality of the model. In addition, we derive prototypes from probability estimation posterior probability estimation which helps to push the decision boundary away from the ambiguity points. Moreover, we employ an efficient method to compute similarity between distributions, eliminating the need for sampling and reparameterization, thereby significantly reducing computational overhead. Further, we dynamically select the ambiguous crops at the image level to enlarge the number of boundary points involved in contrastive learning, which benefits the establishment of precise distributions for each category. Extensive experimentation demonstrates that PPPC not only helps to address ambiguity at the pixel level, yielding discriminative representations but also achieves significant improvements in both synthetic-to-real and day-to-night adaptation tasks. It surpasses the previous state-of-the-art (SOTA) by +5.2% mIoU in the most challenging daytime-to-nighttime adaptation scenario, exhibiting stronger generalization on other unseen datasets. The code and models are available at https://github.com/DarlingInTheSV/Probabilistic-Prototypical-Pixel-Contrast.

10.2CVMay 11, 2025Code

Multimodal Fake News Detection: MFND Dataset and Shallow-Deep Multitask Learning

Ye Zhu, Yunan Wang, Zitong Yu

Multimodal news contains a wealth of information and is easily affected by deepfake modeling attacks. To combat the latest image and text generation methods, we present a new Multimodal Fake News Detection dataset (MFND) containing 11 manipulated types, designed to detect and localize highly authentic fake news. Furthermore, we propose a Shallow-Deep Multitask Learning (SDML) model for fake news, which fully uses unimodal and mutual modal features to mine the intrinsic semantics of news. Under shallow inference, we propose the momentum distillation-based light punishment contrastive learning for fine-grained uniform spatial image and text semantic alignment, and an adaptive cross-modal fusion module to enhance mutual modal features. Under deep inference, we design a two-branch framework to augment the image and text unimodal features, respectively merging with mutual modalities features, for four predictions via dedicated detection and localization projections. Experiments on both mainstream and our proposed datasets demonstrate the superiority of the model. Codes and dataset are released at https://github.com/yunan-wang33/sdml.

14.1CVDec 6, 2024

The Silent Assistant: NoiseQuery as Implicit Guidance for Goal-Driven Image Generation

Ruoyu Wang, Huayang Huang, Ye Zhu et al.

In this work, we introduce NoiseQuery as a novel method for enhanced noise initialization in versatile goal-driven text-to-image (T2I) generation. Specifically, we propose to leverage an aligned Gaussian noise as implicit guidance to complement explicit user-defined inputs, such as text prompts, for better generation quality and controllability. Unlike existing noise optimization methods designed for specific models, our approach is grounded in a fundamental examination of the generic finite-step noise scheduler design in diffusion formulation, allowing better generalization across different diffusion-based architectures in a tuning-free manner. This model-agnostic nature allows us to construct a reusable noise library compatible with multiple T2I models and enhancement techniques, serving as a foundational layer for more effective generation. Extensive experiments demonstrate that NoiseQuery enables fine-grained control and yields significant performance boosts not only over high-level semantics but also over low-level visual attributes, which are typically difficult to specify through text alone, with seamless integration into current workflows with minimal computational overhead.

1.2MMJan 27, 2016

Revisiting copy-move forgery detection by considering realistic image with similar but genuine objects

Ye Zhu, Tian-Tsong Ng, Xuanjing Shen et al.

Many images, of natural or man-made scenes often contain Similar but Genuine Objects (SGO). This poses a challenge to existing Copy-Move Forgery Detection (CMFD) methods which match the key points / blocks, solely based on the pair similarity in the scene. To address such issue, we propose a novel CMFD method using Scaled Harris Feature Descriptors (SHFD) that preform consistently well on forged images with SGO. It involves the following main steps: (i) Pyramid scale space and orientation assignment are used to keep scaling and rotation invariance; (ii) Combined features are applied for precise texture description; (iii) Similar features of two points are matched and RANSAC is used to remove the false matches. The experimental results indicate that the proposed algorithm is effective in detecting SGO and copy-move forgery, which compares favorably to existing methods. Our method exhibits high robustness even when an image is operated by geometric transformation and post-processing