Jiangqun Ni

CV
h-index7
8papers
97citations
Novelty56%
AI Score50

8 Papers

CVSep 22, 2024
Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection

Yuzhen Lin, Wentang Song, Bin Li et al.

Previous studies in deepfake detection have shown promising results when testing face forgeries from the same dataset as the training. However, the problem remains challenging when one tries to generalize the detector to forgeries from unseen datasets and created by unseen methods. In this work, we present a novel general deepfake detection method, called \textbf{C}urricular \textbf{D}ynamic \textbf{F}orgery \textbf{A}ugmentation (CDFA), which jointly trains a deepfake detector with a forgery augmentation policy network. Unlike the previous works, we propose to progressively apply forgery augmentations following a monotonic curriculum during the training. We further propose a dynamic forgery searching strategy to select one suitable forgery augmentation operation for each image varying between training stages, producing a forgery augmentation policy optimized for better generalization. In addition, we propose a novel forgery augmentation named self-shifted blending image to simply imitate the temporal inconsistency of deepfake generation. Comprehensive experiments show that CDFA can significantly improve both cross-datasets and cross-manipulations performances of various naive deepfake detectors in a plug-and-play way, and make them attain superior performances over the existing methods in several benchmark datasets.

CVMar 24
AgentFoX: LLM Agent-Guided Fusion with eXplainability for AI-Generated Image Detection

Yangxin Yu, Yue Zhou, Bin Li et al.

The increasing realism of AI-Generated Images (AIGI) has created an urgent need for forensic tools capable of reliably distinguishing synthetic content from authentic imagery. Existing detectors are typically tailored to specific forgery artifacts--such as frequency-domain patterns or semantic inconsistencies--leading to specialized performance and, at times, conflicting judgments. To address these limitations, we present \textbf{AgentFoX}, a Large Language Model-driven framework that redefines AIGI detection as a dynamic, multi-phase analytical process. Our approach employs a quick-integration fusion mechanism guided by a curated knowledge base comprising calibrated Expert Profiles and contextual Clustering Profiles. During inference, the agent begins with high-level semantic assessment, then transitions to fine-grained, context-aware synthesis of signal-level expert evidence, resolving contradictions through structured reasoning. Instead of returning a coarse binary output, AgentFoX produces a detailed, human-readable forensic report that substantiates its verdict, enhancing interpretability and trustworthiness for real-world deployment. Beyond providing a novel detection solution, this work introduces a scalable agentic paradigm that facilitates intelligent integration of future and evolving forensic tools.

CVNov 16, 2025Code
Toward Real-world Text Image Forgery Localization: Structured and Interpretable Data Synthesis

Zeqin Yu, Haotao Xie, Jian Zhang et al.

Existing Text Image Forgery Localization (T-IFL) methods often suffer from poor generalization due to the limited scale of real-world datasets and the distribution gap caused by synthetic data that fails to capture the complexity of real-world tampering. To tackle this issue, we propose Fourier Series-based Tampering Synthesis (FSTS), a structured and interpretable framework for synthesizing tampered text images. FSTS first collects 16,750 real-world tampering instances from five representative tampering types, using a structured pipeline that records human-performed editing traces via multi-format logs (e.g., video, PSD, and editing logs). By analyzing these collected parameters and identifying recurring behavioral patterns at both individual and population levels, we formulate a hierarchical modeling framework. Specifically, each individual tampering parameter is represented as a compact combination of basis operation-parameter configurations, while the population-level distribution is constructed by aggregating these behaviors. Since this formulation draws inspiration from the Fourier series, it enables an interpretable approximation using basis functions and their learned weights. By sampling from this modeled distribution, FSTS synthesizes diverse and realistic training data that better reflect real-world forgery traces. Extensive experiments across four evaluation protocols demonstrate that models trained with FSTS data achieve significantly improved generalization on real-world datasets. Dataset is available at \href{https://github.com/ZeqinYu/FSTS}{Project Page}.

CVOct 31, 2024
DIP: Diffusion Learning of Inconsistency Pattern for General DeepFake Detection

Fan Nie, Jiangqun Ni, Jian Zhang et al.

With the advancement of deepfake generation techniques, the importance of deepfake detection in protecting multimedia content integrity has become increasingly obvious. Recently, temporal inconsistency clues have been explored to improve the generalizability of deepfake video detection. According to our observation, the temporal artifacts of forged videos in terms of motion information usually exhibits quite distinct inconsistency patterns along horizontal and vertical directions, which could be leveraged to improve the generalizability of detectors. In this paper, a transformer-based framework for Diffusion Learning of Inconsistency Pattern (DIP) is proposed, which exploits directional inconsistencies for deepfake video detection. Specifically, DIP begins with a spatiotemporal encoder to represent spatiotemporal information. A directional inconsistency decoder is adopted accordingly, where direction-aware attention and inconsistency diffusion are incorporated to explore potential inconsistency patterns and jointly learn the inherent relationships. In addition, the SpatioTemporal Invariant Loss (STI Loss) is introduced to contrast spatiotemporally augmented sample pairs and prevent the model from overfitting nonessential forgery artifacts. Extensive experiments on several public datasets demonstrate that our method could effectively identify directional forgery clues and achieve state-of-the-art performance.

CVApr 7, 2025
Reinforced Multi-teacher Knowledge Distillation for Efficient General Image Forgery Detection and Localization

Zeqin Yu, Jiangqun Ni, Jian Zhang et al.

Image forgery detection and localization (IFDL) is of vital importance as forged images can spread misinformation that poses potential threats to our daily lives. However, previous methods still struggled to effectively handle forged images processed with diverse forgery operations in real-world scenarios. In this paper, we propose a novel Reinforced Multi-teacher Knowledge Distillation (Re-MTKD) framework for the IFDL task, structured around an encoder-decoder \textbf{C}onvNeXt-\textbf{U}perNet along with \textbf{E}dge-Aware Module, named Cue-Net. First, three Cue-Net models are separately trained for the three main types of image forgeries, i.e., copy-move, splicing, and inpainting, which then serve as the multi-teacher models to train the target student model with Cue-Net through self-knowledge distillation. A Reinforced Dynamic Teacher Selection (Re-DTS) strategy is developed to dynamically assign weights to the involved teacher models, which facilitates specific knowledge transfer and enables the student model to effectively learn both the common and specific natures of diverse tampering traces. Extensive experiments demonstrate that, compared with other state-of-the-art methods, the proposed method achieves superior performance on several recently emerged datasets comprised of various kinds of image forgeries.

MMNov 24, 2025
Towards Generalizable Deepfake Detection via Forgery-aware Audio-Visual Adaptation: A Variational Bayesian Approach

Fan Nie, Jiangqun Ni, Jian Zhang et al.

The widespread application of AIGC contents has brought not only unprecedented opportunities, but also potential security concerns, e.g., audio-visual deepfakes. Therefore, it is of great importance to develop an effective and generalizable method for multi-modal deepfake detection. Typically, the audio-visual correlation learning could expose subtle cross-modal inconsistencies, e.g., audio-visual misalignment, which serve as crucial clues in deepfake detection. In this paper, we reformulate the correlation learning with variational Bayesian estimation, where audio-visual correlation is approximated as a Gaussian distributed latent variable, and thus develop a novel framework for deepfake detection, i.e., Forgery-aware Audio-Visual Adaptation with Variational Bayes (FoVB). Specifically, given the prior knowledge of pre-trained backbones, we adopt two core designs to estimate audio-visual correlations effectively. First, we exploit various difference convolutions and a high-pass filter to discern local and global forgery traces from both modalities. Second, with the extracted forgery-aware features, we estimate the latent Gaussian variable of audio-visual correlation via variational Bayes. Then, we factorize the variable into modality-specific and correlation-specific ones with orthogonality constraint, allowing them to better learn intra-modal and cross-modal forgery traces with less entanglement. Extensive experiments demonstrate that our FoVB outperforms other state-of-the-art methods in various benchmarks.

MMAug 6, 2019
New Design Paradigm of Distortion Cost Function for Efficient JPEG Steganography

Wenkang Su, Jiangqun Ni, Xianglei Hu et al.

Recently, with the introduction of JPEG phase-aware steganalysis features, e.g., GFR, the design of JPEG steganographic distortion cost function turns to maintain not only the statistical undetectability in DCT domain but also in spatial domain. To tackle this issue, this paper presents a novel paradigm for the design of JPEG steganographic distortion cost function, which calculates the distortion cost via a generalized Distortion Cost Domain Transformation (DCDT) function. The proposed function comprises the decompressed pixel block embedding changes and their corresponding embedding distortion costs for unit change, where the pixel embedding distortion costs are represented in a more general exponential model, aiming to flexibly allocate the embedding data. In this way, the JPEG steganography could be formulated as the optimization problem of minimizing the overall distortion cost in its decompressed spatial domain, which is equivalent to maximizing its statistical undetectability against JPEG phase-aware steganalysis features. Experimental results show that the proposed DCDT equipped with HiLL (a spatial steganographic distortion cost function) is superior to other state-of-the-art JPEG steganographic schemes, e.g., UERD, J-UNIWARD, and GUED in resisting the detection of JPEG phase-aware feature-based steganalyzers GFR and SCA-GFR, and rivals BET-HiLL with one order of magnitude lower computational complexity, along with the possibility of being further improved by considering the mutually dependent embedding interactions. In addition, the proposed DCDT is also verified to be effective for different image databases and quality factors.

MMAug 5, 2019
Image Steganography using Gaussian Markov Random Field Model

Wenkang Su, Jiangqun Ni, Yuanfeng Pan et al.

Recent advances on adaptive steganography show that the performance of image steganographic communication can be improved by incorporating the non-additive models that capture the dependences among adjacent pixels. In this paper, a Gaussian Markov Random Field model (GMRF) with four-element cross neighborhood is proposed to characterize the interactions among local elements of cover images, and the problem of secure image steganography is formulated as the one of minimization of KL-divergence in terms of a series of low-dimensional clique structures associated with GMRF by taking advantages of the conditional independence of GMRF. The adoption of the proposed GMRF tessellates the cover image into two disjoint subimages, and an alternating iterative optimization scheme is developed to effectively embed the given payload while minimizing the total KL-divergence between cover and stego, i.e., the statistical detectability. Experimental results demonstrate that the proposed GMRF outperforms the prior arts of model based schemes, e.g., MiPOD, and rivals the state-of-the-art HiLL for practical steganography, where the selection channel knowledges are unavailable to steganalyzers.