Jiwu Huang

CV
h-index62
39papers
1,387citations
Novelty53%
AI Score59

39 Papers

CVSep 20, 2023Code
Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer

Anwei Luo, Rizhao Cai, Chenqi Kong et al.

With the rapid progress of generative models, the current challenge in face forgery detection is how to effectively detect realistic manipulated faces from different unseen domains. Though previous studies show that pre-trained Vision Transformer (ViT) based models can achieve some promising results after fully fine-tuning on the Deepfake dataset, their generalization performances are still unsatisfactory. One possible reason is that fully fine-tuned ViT-based models may disrupt the pre-trained features [1, 2] and overfit to some data-specific patterns [3]. To alleviate this issue, we present a \textbf{F}orgery-aware \textbf{A}daptive \textbf{Vi}sion \textbf{T}ransformer (FA-ViT) under the adaptive learning paradigm, where the parameters in the pre-trained ViT are kept fixed while the designed adaptive modules are optimized to capture forgery features. Specifically, a global adaptive module is designed to model long-range interactions among input tokens, which takes advantage of self-attention mechanism to mine global forgery clues. To further explore essential local forgery clues, a local adaptive module is proposed to expose local inconsistencies by enhancing the local contextual association. In addition, we introduce a fine-grained adaptive learning module that emphasizes the common compact representation of genuine faces through relationship learning in fine-grained pairs, driving these proposed adaptive modules to be aware of fine-grained forgery-aware information. Extensive experiments demonstrate that our FA-ViT achieves state-of-the-arts results in the cross-dataset evaluation, and enhances the robustness against unseen perturbations. Particularly, FA-ViT achieves 93.83\% and 78.32\% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation. The code and trained model have been released at: https://github.com/LoveSiameseCat/FAViT.

CVMay 27
SIGMA: Semantic-Difference Instruction-Grounding Mask Annotator for Text-Driven Image Manipulation Localization

Peiyu Zhuang, Jianquan Yang, Haodong Li et al.

Text-driven image editing has advanced rapidly, but reliably localizing these manipulations requires image manipulation localization (IML) models trained on large pixel-annotated datasets, and there is still no low-cost way to obtain such training data at scale. We observe that these data already exist in disguise: public editing datasets contain millions of structurally identical (original, edited) pairs to IML training samples, lacking only pixel-level masks. Recovering these masks automatically is non-trivial: pixel differencing is overwhelmed by diffusion-induced perturbations across all pixels, and instruction-only grounding localizes only what the prompt describes, missing unintended editor side-effects. We propose SIGMA (Semantic-difference Instruction-Grounding Mask Annotator), which performs semantic-feature differencing in a vision foundation backbone and injects an instruction-derived spatial prior into this visual stream via bidirectional cross-modal refinement, amplifying the difference signal at intended-edit regions when the editor faithfully realizes user intent. SIGMA is trained in two complementary stages: Stage I supervises on inpainting masks; Stage II closes the diffusion-domain shift via VAE-roundtrip noise calibration, EMA self-training, and an edit-noise disentanglement loss. SIGMA outperforms existing automatic mask generators on five benchmarks (+12.20% F1, +11.16% IoU). When applied to public editing corpora, it produces a ~1.1M IML training set that improves six diverse detectors by +18.34% F1 across five datasets, turning previously unused editing data into a model-agnostic supervisory resource for IML. We'll release the full codebase as soon as the paper is accepted.

CVApr 24, 2023
Beyond the Prior Forgery Knowledge: Mining Critical Clues for General Face Forgery Detection

Anwei Luo, Chenqi Kong, Jiwu Huang et al.

Face forgery detection is essential in combating malicious digital face attacks. Previous methods mainly rely on prior expert knowledge to capture specific forgery clues, such as noise patterns, blending boundaries, and frequency artifacts. However, these methods tend to get trapped in local optima, resulting in limited robustness and generalization capability. To address these issues, we propose a novel Critical Forgery Mining (CFM) framework, which can be flexibly assembled with various backbones to boost their generalization and robustness performance. Specifically, we first build a fine-grained triplet and suppress specific forgery traces through prior knowledge-agnostic data augmentation. Subsequently, we propose a fine-grained relation learning prototype to mine critical information in forgeries through instance and local similarity-aware losses. Moreover, we design a novel progressive learning controller to guide the model to focus on principal feature components, enabling it to learn critical forgery features in a coarse-to-fine manner. The proposed method achieves state-of-the-art forgery detection performance under various challenging evaluation settings.

CVNov 8, 2022
ReLoc: A Restoration-Assisted Framework for Robust Image Tampering Localization

Peiyu Zhuang, Haodong Li, Rui Yang et al.

With the spread of tampered images, locating the tampered regions in digital images has drawn increasing attention. The existing image tampering localization methods, however, suffer from severe performance degradation when the tampered images are subjected to some post-processing, as the tampering traces would be distorted by the post-processing operations. The poor robustness against post-processing has become a bottleneck for the practical applications of image tampering localization techniques. In order to address this issue, this paper proposes a novel restoration-assisted framework for image tampering localization (ReLoc). The ReLoc framework mainly consists of an image restoration module and a tampering localization module. The key idea of ReLoc is to use the restoration module to recover a high-quality counterpart of the distorted tampered image, such that the distorted tampering traces can be re-enhanced, facilitating the tampering localization module to identify the tampered regions. To achieve this, the restoration module is optimized not only with the conventional constraints on image visual quality but also with a forensics-oriented objective function. Furthermore, the restoration module and the localization module are trained alternately, which can stabilize the training process and is beneficial for improving the performance. The proposed framework is evaluated by fighting against JPEG compression, the most commonly used post-processing. Extensive experimental results show that ReLoc can significantly improve the robustness against JPEG compression. The restoration module in a well-trained ReLoc model is transferable. Namely, it is still effective when being directly deployed with another tampering localization module.

CVJun 12, 2022
STD-NET: Search of Image Steganalytic Deep-learning Architecture via Hierarchical Tensor Decomposition

Shunquan Tan, Qiushi Li, Laiyuan Li et al.

Recent studies shows that the majority of existing deep steganalysis models have a large amount of redundancy, which leads to a huge waste of storage and computing resources. The existing model compression method cannot flexibly compress the convolutional layer in residual shortcut block so that a satisfactory shrinking rate cannot be obtained. In this paper, we propose STD-NET, an unsupervised deep-learning architecture search approach via hierarchical tensor decomposition for image steganalysis. Our proposed strategy will not be restricted by various residual connections, since this strategy does not change the number of input and output channels of the convolution block. We propose a normalized distortion threshold to evaluate the sensitivity of each involved convolutional layer of the base model to guide STD-NET to compress target network in an efficient and unsupervised approach, and obtain two network structures of different shapes with low computation cost and similar performance compared with the original one. Extensive experiments have confirmed that, on one hand, our model can achieve comparable or even better detection performance in various steganalytic scenarios due to the great adaptivity of the obtained network architecture. On the other hand, the experimental results also demonstrate that our proposed strategy is more efficient and can remove more redundancy compared with previous steganalytic network compression methods.

CVSep 5, 2022
Forensicability Assessment of Questioned Images in Recapturing Detection

Changsheng Chen, Lin Zhao, Rizhao Cai et al.

Recapture detection of face and document images is an important forensic task. With deep learning, the performances of face anti-spoofing (FAS) and recaptured document detection have been improved significantly. However, the performances are not yet satisfactory on samples with weak forensic cues. The amount of forensic cues can be quantified to allow a reliable forensic result. In this work, we propose a forensicability assessment network to quantify the forensicability of the questioned samples. The low-forensicability samples are rejected before the actual recapturing detection process to improve the efficiency of recapturing detection systems. We first extract forensicability features related to both image quality assessment and forensic tasks. By exploiting domain knowledge of the forensic application in image quality and forensic features, we define three task-specific forensicability classes and the initialized locations in the feature space. Based on the extracted features and the defined centers, we train the proposed forensic assessment network (FANet) with cross-entropy loss and update the centers with a momentum-based update method. We integrate the trained FANet with practical recapturing detection schemes in face anti-spoofing and recaptured document detection tasks. Experimental results show that, for a generic CNN-based FAS scheme, FANet reduces the EERs from 33.75% to 19.23% under ROSE to IDIAP protocol by rejecting samples with the lowest 30% forensicability scores. The performance of FAS schemes is poor in the rejected samples, with EER as high as 56.48%. Similar performances in rejecting low-forensicability samples have been observed for the state-of-the-art approaches in FAS and recaptured document detection tasks. To the best of our knowledge, this is the first work that assesses the forensicability of recaptured document images and improves the system efficiency.

CVMay 11Code
Adversarial Attacks Against MLLMs via Progressive Resolution Processing and Adaptive Feature Alignment

Haobo Wang, Xiaorong Ma, Weiqi Luo et al.

Adversarial perturbations can mislead Multimodal Large Language Models (MLLMs) recognize a benign image as a specific target object, posing serious risks in safety-critical scenarios such as autonomous driving and medical diagnosis. This makes transfer-based targeted attacks crucial for understanding and improving black-box MLLM robustness. Existing transfer-based targeted attack methods typically rely on the final global features of the surrogate encoder and anchor optimization to original-resolution target crops, leading to their limited transferability and robustness. To address these challenges, we propose Progressive Resolution Processing and Adaptive Feature Alignment (PRAF-Attack), a targeted transfer-based attack framework that integrates multi-scale global semantic guidance with robust intermediate-layer local alignment. Unlike prior methods that align only the surrogate encoder's final layer, we design an adaptive feature alignment strategy that leverages intermediate representations to enhance transferability. Specifically, we introduce an adaptive intermediate layer selection mechanism to identify transferable hierarchical features across surrogate ensembles via gradient consistency, along with an adaptive patch-level optimization strategy that preserves highly correlated local regions through efficient patch filtering. To overcome the reliance on fixed original-resolution target crops, we propose a progressive resolution processing strategy that gradually refines optimization from coarse to fine, enabling the attack to better exploit target information at multiple scales and achieve stronger transferability. We evaluate PRAF-Attack on a diverse suite of black-box MLLMs, including six open-source models and six closed-source commercial APIs. Compared with seven state-of-the-art targeted attack baselines, the proposed PRAF-Attack consistently achieves superior transferability.

CVOct 16, 2023
Evading Detection Actively: Toward Anti-Forensics against Forgery Localization

Long Zhuo, Shenghai Luo, Shunquan Tan et al.

Anti-forensics seeks to eliminate or conceal traces of tampering artifacts. Typically, anti-forensic methods are designed to deceive binary detectors and persuade them to misjudge the authenticity of an image. However, to the best of our knowledge, no attempts have been made to deceive forgery detectors at the pixel level and mis-locate forged regions. Traditional adversarial attack methods cannot be directly used against forgery localization due to the following defects: 1) they tend to just naively induce the target forensic models to flip their pixel-level pristine or forged decisions; 2) their anti-forensics performance tends to be severely degraded when faced with the unseen forensic models; 3) they lose validity once the target forensic models are retrained with the anti-forensics images generated by them. To tackle the three defects, we propose SEAR (Self-supErvised Anti-foRensics), a novel self-supervised and adversarial training algorithm that effectively trains deep-learning anti-forensic models against forgery localization. SEAR sets a pretext task to reconstruct perturbation for self-supervised learning. In adversarial training, SEAR employs a forgery localization model as a supervisor to explore tampering features and constructs a deep-learning concealer to erase corresponding traces. We have conducted largescale experiments across diverse datasets. The experimental results demonstrate that, through the combination of self-supervised learning and adversarial learning, SEAR successfully deceives the state-of-the-art forgery localization methods, as well as tackle the three defects regarding traditional adversarial attack methods mentioned above.

CRDec 13, 2023Code
Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4

Pei Yan, Shunquan Tan, Miaohui Wang et al.

Dynamic analysis methods effectively identify shelled, wrapped, or obfuscated malware, thereby preventing them from invading computers. As a significant representation of dynamic malware behavior, the API (Application Programming Interface) sequence, comprised of consecutive API calls, has progressively become the dominant feature of dynamic analysis methods. Though there have been numerous deep learning models for malware detection based on API sequences, the quality of API call representations produced by those models is limited. These models cannot generate representations for unknown API calls, which weakens both the detection performance and the generalization. Further, the concept drift phenomenon of API calls is prominent. To tackle these issues, we introduce a prompt engineering-assisted malware dynamic analysis using GPT-4. In this method, GPT-4 is employed to create explanatory text for each API call within the API sequence. Afterward, the pre-trained language model BERT is used to obtain the representation of the text, from which we derive the representation of the API sequence. Theoretically, this proposed method is capable of generating representations for all API calls, excluding the necessity for dataset training during the generation process. Utilizing the representation, a CNN-based detection model is designed to extract the feature. We adopt five benchmark datasets to validate the performance of the proposed model. The experimental results reveal that the proposed detection algorithm performs better than the state-of-the-art method (TextCNN). Specifically, in cross-database experiments and few-shot learning experiments, the proposed model achieves excellent detection performance and almost a 100% recall rate for malware, verifying its superior generalization performance. The code is available at: github.com/yan-scnu/Prompted_Dynamic_Detection.

CVNov 16, 2025Code
Toward Real-world Text Image Forgery Localization: Structured and Interpretable Data Synthesis

Zeqin Yu, Haotao Xie, Jian Zhang et al.

Existing Text Image Forgery Localization (T-IFL) methods often suffer from poor generalization due to the limited scale of real-world datasets and the distribution gap caused by synthetic data that fails to capture the complexity of real-world tampering. To tackle this issue, we propose Fourier Series-based Tampering Synthesis (FSTS), a structured and interpretable framework for synthesizing tampered text images. FSTS first collects 16,750 real-world tampering instances from five representative tampering types, using a structured pipeline that records human-performed editing traces via multi-format logs (e.g., video, PSD, and editing logs). By analyzing these collected parameters and identifying recurring behavioral patterns at both individual and population levels, we formulate a hierarchical modeling framework. Specifically, each individual tampering parameter is represented as a compact combination of basis operation-parameter configurations, while the population-level distribution is constructed by aggregating these behaviors. Since this formulation draws inspiration from the Fourier series, it enables an interpretable approximation using basis functions and their learned weights. By sampling from this modeled distribution, FSTS synthesizes diverse and realistic training data that better reflect real-world forgery traces. Extensive experiments across four evaluation protocols demonstrate that models trained with FSTS data achieve significantly improved generalization on real-world datasets. Dataset is available at \href{https://github.com/ZeqinYu/FSTS}{Project Page}.

CVAug 10, 2025Code
CLUE: Leveraging Low-Rank Adaptation to Capture Latent Uncovered Evidence for Image Forgery Localization

Youqi Wang, Shunquan Tan, Rongxuan Peng et al.

The increasing accessibility of image editing tools and generative AI has led to a proliferation of visually convincing forgeries, compromising the authenticity of digital media. In this paper, in addition to leveraging distortions from conventional forgeries, we repurpose the mechanism of a state-of-the-art (SOTA) text-to-image synthesis model by exploiting its internal generative process, turning it into a high-fidelity forgery localization tool. To this end, we propose CLUE (Capture Latent Uncovered Evidence), a framework that employs Low- Rank Adaptation (LoRA) to parameter-efficiently reconfigure Stable Diffusion 3 (SD3) as a forensic feature extractor. Our approach begins with the strategic use of SD3's Rectified Flow (RF) mechanism to inject noise at varying intensities into the latent representation, thereby steering the LoRAtuned denoising process to amplify subtle statistical inconsistencies indicative of a forgery. To complement the latent analysis with high-level semantic context and precise spatial details, our method incorporates contextual features from the image encoder of the Segment Anything Model (SAM), which is parameter-efficiently adapted to better trace the boundaries of forged regions. Extensive evaluations demonstrate CLUE's SOTA generalization performance, significantly outperforming prior methods. Furthermore, CLUE shows superior robustness against common post-processing attacks and Online Social Networks (OSNs). Code is publicly available at https://github.com/SZAISEC/CLUE.

CVAug 10, 2025Code
ForensicsSAM: Toward Robust and Unified Image Forgery Detection and Localization Resisting to Adversarial Attack

Rongxuan Peng, Shunquan Tan, Chenqi Kong et al.

Parameter-efficient fine-tuning (PEFT) has emerged as a popular strategy for adapting large vision foundation models, such as the Segment Anything Model (SAM) and LLaVA, to downstream tasks like image forgery detection and localization (IFDL). However, existing PEFT-based approaches overlook their vulnerability to adversarial attacks. In this paper, we show that highly transferable adversarial images can be crafted solely via the upstream model, without accessing the downstream model or training data, significantly degrading the IFDL performance. To address this, we propose ForensicsSAM, a unified IFDL framework with built-in adversarial robustness. Our design is guided by three key ideas: (1) To compensate for the lack of forgery-relevant knowledge in the frozen image encoder, we inject forgery experts into each transformer block to enhance its ability to capture forgery artifacts. These forgery experts are always activated and shared across any input images. (2) To detect adversarial images, we design an light-weight adversary detector that learns to capture structured, task-specific artifact in RGB domain, enabling reliable discrimination across various attack methods. (3) To resist adversarial attacks, we inject adversary experts into the global attention layers and MLP modules to progressively correct feature shifts induced by adversarial noise. These adversary experts are adaptively activated by the adversary detector, thereby avoiding unnecessary interference with clean images. Extensive experiments across multiple benchmarks demonstrate that ForensicsSAM achieves superior resistance to various adversarial attack methods, while also delivering state-of-the-art performance in image-level forgery detection and pixel-level forgery localization. The resource is available at https://github.com/siriusPRX/ForensicsSAM.

CVApr 23
AttDiff-GAN: A Hybrid Diffusion-GAN Framework for Facial Attribute Editing

Wenmin Huang, Weiqi Luo, Xiaochun Cao et al.

Facial attribute editing aims to modify target attributes while preserving attribute-irrelevant content and overall image fidelity. Existing GAN-based methods provide favorable controllability, but often suffer from weak alignment between style codes and attribute semantics. Diffusion-based methods can synthesize highly realistic images; however, their editing precision is limited by the entanglement of semantic directions among different attributes. In this paper, we propose AttDiff-GAN, a hybrid framework that combines GAN-based attribute manipulation with diffusion-based image generation. A key challenge in such integration lies in the inconsistency between one-step adversarial learning and multi-step diffusion denoising, which makes effective optimization difficult. To address this issue, we decouple attribute editing from image synthesis by introducing a feature-level adversarial learning scheme to learn explicit attribute manipulation, and then using the manipulated features to guide the diffusion process for image generation, while also removing the reliance on semantic direction-based editing. Moreover, we enhance style-attribute alignment by introducing PriorMapper, which incorporates facial priors into style generation, and RefineExtractor, which captures global semantic relationships through a Transformer for more precise style extraction. Experimental results on CelebA-HQ show that the proposed method achieves more accurate facial attribute editing and better preservation of non-target attributes than state-of-the-art methods in both qualitative and quantitative evaluations.

CVApr 23
LatRef-Diff: Latent and Reference-Guided Diffusion for Facial Attribute Editing and Style Manipulation

Wenmin Huang, Weiqi Luo, Xiaochun Cao et al.

Facial attribute editing and style manipulation are crucial for applications like virtual avatars and photo editing. However, achieving precise control over facial attributes without altering unrelated features is challenging due to the complexity of facial structures and the strong correlations between attributes. While conditional GANs have shown progress, they are limited by accuracy issues and training instability. Diffusion models, though promising, face challenges in style manipulation due to the limited expressiveness of semantic directions. In this paper, we propose LatRef-Diff, a novel diffusion-based framework that addresses these limitations. We replace the traditional semantic directions in diffusion models with style codes and propose two methods for generating them: latent and reference guidance. Based on these style codes, we design a style modulation module that integrates them into the target image, enabling both random and customized style manipulation. This module incorporates learnable vectors, cross-attention mechanisms, and a hierarchical design to improve accuracy and image quality. Additionally, to enhance training stability while eliminating the need for paired images (e.g., before and after editing), we propose a forward-backward consistency training strategy. This strategy first removes the target attribute approximately using image-specific semantic directions and then restores it via style modulation, guided by perceptual and classification losses. Extensive experiments on CelebA-HQ demonstrate that LatRef-Diff achieves state-of-the-art performance in both qualitative and quantitative evaluations. Ablation studies validate the effectiveness of our model's design choices.

CVJun 15, 2025
Active Adversarial Noise Suppression for Image Forgery Localization

Rongxuan Peng, Shunquan Tan, Xianbo Mo et al.

Recent advances in deep learning have significantly propelled the development of image forgery localization. However, existing models remain highly vulnerable to adversarial attacks: imperceptible noise added to forged images can severely mislead these models. In this paper, we address this challenge with an Adversarial Noise Suppression Module (ANSM) that generate a defensive perturbation to suppress the attack effect of adversarial noise. We observe that forgery-relevant features extracted from adversarial and original forged images exhibit distinct distributions. To bridge this gap, we introduce Forgery-relevant Features Alignment (FFA) as a first-stage training strategy, which reduces distributional discrepancies by minimizing the channel-wise Kullback-Leibler divergence between these features. To further refine the defensive perturbation, we design a second-stage training strategy, termed Mask-guided Refinement (MgR), which incorporates a dual-mask constraint. MgR ensures that the perturbation remains effective for both adversarial and original forged images, recovering forgery localization accuracy to their original level. Extensive experiments across various attack algorithms demonstrate that our method significantly restores the forgery localization model's performance on adversarial images. Notably, when ANSM is applied to original forged images, the performance remains nearly unaffected. To our best knowledge, this is the first report of adversarial defense in image forgery localization tasks. We have released the source code and anti-forensics dataset.

CVAug 27, 2025
SDiFL: Stable Diffusion-Driven Framework for Image Forgery Localization

Yang Su, Shunquan Tan, Jiwu Huang

Driven by the new generation of multi-modal large models, such as Stable Diffusion (SD), image manipulation technologies have advanced rapidly, posing significant challenges to image forensics. However, existing image forgery localization methods, which heavily rely on labor-intensive and costly annotated data, are struggling to keep pace with these emerging image manipulation technologies. To address these challenges, we are the first to integrate both image generation and powerful perceptual capabilities of SD into an image forensic framework, enabling more efficient and accurate forgery localization. First, we theoretically show that the multi-modal architecture of SD can be conditioned on forgery-related information, enabling the model to inherently output forgery localization results. Then, building on this foundation, we specifically leverage the multimodal framework of Stable DiffusionV3 (SD3) to enhance forgery localization performance.We leverage the multi-modal processing capabilities of SD3 in the latent space by treating image forgery residuals -- high-frequency signals extracted using specific highpass filters -- as an explicit modality. This modality is fused into the latent space during training to enhance forgery localization performance. Notably, our method fully preserves the latent features extracted by SD3, thereby retaining the rich semantic information of the input image. Experimental results show that our framework achieves up to 12% improvements in performance on widely used benchmarking datasets compared to current state-of-the-art image forgery localization models. Encouragingly, the model demonstrates strong performance on forensic tasks involving real-world document forgery images and natural scene forging images, even when such data were entirely unseen during training.

CVNov 24, 2021
Universal Deep Network for Steganalysis of Color Image based on Channel Representation

Kangkang Wei, Weiqi Luo, Shunquan Tan et al.

Up to now, most existing steganalytic methods are designed for grayscale images, and they are not suitable for color images that are widely used in current social networks. In this paper, we design a universal color image steganalysis network (called UCNet) in spatial and JPEG domains. The proposed method includes preprocessing, convolutional, and classification modules. To preserve the steganographic artifacts in each color channel, in preprocessing module, we firstly separate the input image into three channels according to the corresponding embedding spaces (i.e. RGB for spatial steganography and YCbCr for JPEG steganography), and then extract the image residuals with 62 fixed high-pass filters, finally concatenate all truncated residuals for subsequent analysis rather than adding them together with normal convolution like existing CNN-based steganalyzers. To accelerate the network convergence and effectively reduce the number of parameters, in convolutional module, we carefully design three types of layers with different shortcut connections and group convolution structures to further learn high-level steganalytic features. In classification module, we employ a global average pooling and fully connected layer for classification. We conduct extensive experiments on ALASKA II to demonstrate that the proposed method can achieve state-of-the-art results compared with the modern CNN-based steganalyzers (e.g., SRNet and J-YeNet) in both spatial and JPEG domains, while keeping relatively few memory requirements and training time. Furthermore, we also provide necessary descriptions and many ablation experiments to verify the rationality of the network design.

IVAug 30, 2021
Robust Privacy-Preserving Motion Detection and Object Tracking in Encrypted Streaming Video

Xianhao Tian, Peijia Zheng, Jiwu Huang

Video privacy leakage is becoming an increasingly severe public problem, especially in cloud-based video surveillance systems. It leads to the new need for secure cloud-based video applications, where the video is encrypted for privacy protection. Despite some methods that have been proposed for encrypted video moving object detection and tracking, none has robust performance against complex and dynamic scenes. In this paper, we propose an efficient and robust privacy-preserving motion detection and multiple object tracking scheme for encrypted surveillance video bitstreams. By analyzing the properties of the video codec and format-compliant encryption schemes, we propose a new compressed-domain feature to capture motion information in complex surveillance scenarios. Based on this feature, we design an adaptive clustering algorithm for moving object segmentation with an accuracy of 4x4 pixels. We then propose a multiple object tracking scheme that uses Kalman filter estimation and adaptive measurement refinement. The proposed scheme does not require video decryption or full decompression and has a very low computation load. The experimental results demonstrate that our scheme achieves the best detection and tracking performance compared with existing works in the encrypted and compressed domain. Our scheme can be effectively used in complex surveillance scenarios with different challenges, such as camera movement/jitter, dynamic background, and shadows.

CVJul 6, 2021
Self-Adversarial Training incorporating Forgery Attention for Image Forgery Localization

Long Zhuo, Shunquan Tan, Bin Li et al.

Image editing techniques enable people to modify the content of an image without leaving visual traces and thus may cause serious security risks. Hence the detection and localization of these forgeries become quite necessary and challenging. Furthermore, unlike other tasks with extensive data, there is usually a lack of annotated forged images for training due to annotation difficulties. In this paper, we propose a self-adversarial training strategy and a reliable coarse-to-fine network that utilizes a self-attention mechanism to localize forged regions in forgery images. The self-attention module is based on a Channel-Wise High Pass Filter block (CW-HPF). CW-HPF leverages inter-channel relationships of features and extracts noise features by high pass filters. Based on the CW-HPF, a self-attention mechanism, called forgery attention, is proposed to capture rich contextual dependencies of intrinsic inconsistency extracted from tampered regions. Specifically, we append two types of attention modules on top of CW-HPF respectively to model internal interdependencies in spatial dimension and external dependencies among channels. We exploit a coarse-to-fine network to enhance the noise inconsistency between original and tampered regions. More importantly, to address the issue of insufficient training data, we design a self-adversarial training strategy that expands training data dynamically to achieve more robust performance. Specifically, in each training iteration, we perform adversarial attacks against our network to generate adversarial examples and train our model on them. Extensive experimental results demonstrate that our proposed algorithm steadily outperforms state-of-the-art methods by a clear margin in different benchmark datasets.

CVJun 24, 2021
Detection of Deepfake Videos Using Long Distance Attention

Wei Lu, Lingyi Liu, Junwei Luo et al.

With the rapid progress of deepfake techniques in recent years, facial video forgery can generate highly deceptive video contents and bring severe security threats. And detection of such forgery videos is much more urgent and challenging. Most existing detection methods treat the problem as a vanilla binary classification problem. In this paper, the problem is treated as a special fine-grained classification problem since the differences between fake and real faces are very subtle. It is observed that most existing face forgery methods left some common artifacts in the spatial domain and time domain, including generative defects in the spatial domain and inter-frame inconsistencies in the time domain. And a spatial-temporal model is proposed which has two components for capturing spatial and temporal forgery traces in global perspective respectively. The two components are designed using a novel long distance attention mechanism. The one component of the spatial domain is used to capture artifacts in a single frame, and the other component of the time domain is used to capture artifacts in consecutive frames. They generate attention maps in the form of patches. The attention method has a broader vision which contributes to better assembling global information and extracting local statistic information. Finally, the attention maps are used to guide the network to focus on pivotal parts of the face, just like other fine-grained classification methods. The experimental results on different public datasets demonstrate that the proposed method achieves the state-of-the-art performance, and the proposed long distance attention method can effectively capture pivotal parts for face forgery.

CRMay 19, 2021
FairCMS: Cloud Media Sharing with Fair Copyright Protection

Xiangli Xiao, Yushu Zhang, Leo Yu Zhang et al.

The onerous media sharing task prompts resource-constrained media owners to seek help from a cloud platform, i.e., storing media contents in the cloud and letting the cloud do the sharing. There are three key security/privacy problems that need to be solved in the cloud media sharing scenario, including data privacy leakage and access control in the cloud, infringement on the owner's copyright, and infringement on the user's rights. In view of the fact that no single technique can solve the above three problems simultaneously, two cloud media sharing schemes are proposed in this paper, named FairCMS-I and FairCMS-II. By cleverly utilizing the proxy re-encryption technique and the asymmetric fingerprinting technique, FairCMS-I and FairCMS-II solve the above three problems with different privacy/efficiency trade-offs. Among them, FairCMS-I focuses more on cloud-side efficiency while FairCMS-II focuses more on the security of the media content, which provides owners with flexibility of choice. In addition, FairCMS-I and FairCMS-II also have advantages over existing cloud media sharing efforts in terms of optional IND-CPA (indistinguishability under chosen-plaintext attack) security and high cloud-side efficiency, as well as exemption from needing a trusted third party. Furthermore, FairCMS-I and FairCMS-II allow owners to reap significant local resource savings and thus can be seen as the privacy-preserving outsourcing of asymmetric fingerprinting. Finally, the feasibility and efficiency of FairCMS-I and FairCMS-II are demonstrated by experiments.

CRMay 9, 2021
Improving Cost Learning for JPEG Steganography by Exploiting JPEG Domain Knowledge

Weixuan Tang, Bin Li, Mauro Barni et al.

Although significant progress in automatic learning of steganographic cost has been achieved recently, existing methods designed for spatial images are not well applicable to JPEG images which are more common media in daily life. The difficulties of migration mostly lie in the unique and complicated JPEG characteristics caused by 8x8 DCT mode structure. To address the issue, in this paper we extend an existing automatic cost learning scheme to JPEG, where the proposed scheme called JEC-RL (JPEG Embedding Cost with Reinforcement Learning) is explicitly designed to tailor the JPEG DCT structure. It works with the embedding action sampling mechanism under reinforcement learning, where a policy network learns the optimal embedding policies via maximizing the rewards provided by an environment network. The policy network is constructed following a domain-transition design paradigm, where three modules including pixel-level texture complexity evaluation, DCT feature extraction, and mode-wise rearrangement, are proposed. These modules operate in serial, gradually extracting useful features from a decompressed JPEG image and converting them into embedding policies for DCT elements, while considering JPEG characteristics including inter-block and intra-block correlations simultaneously. The environment network is designed in a gradient-oriented way to provide stable reward values by using a wide architecture equipped with a fixed preprocessing layer with 8x8 DCT basis filters. Extensive experiments and ablation studies demonstrate that the proposed method can achieve good security performance for JPEG images against both advanced feature based and modern CNN based steganalyzers.

MMApr 14, 2021
Landmarking for Navigational Streaming of Stored High-Dimensional Media

Yuan Yuan, Gene Cheung, Pascal Frossard et al.

Modern media data such as 360 videos and light field (LF) images are typically captured in much higher dimensions than the observers' visual displays. To efficiently browse high-dimensional media over bandwidth-constrained networks, a navigational streaming model is considered: a client navigates the large media space by dictating a navigation path to a server, who in response transmits the corresponding pre-encoded media data units (MDU) to the client one-by-one in sequence. Intra-coding an MDU (I-MDU) would result in a large bitrate but I-MDU can be randomly accessed, while inter-coding an MDU (P-MDU) using another MDU as a predictor incurs a small coding cost but imposes an order where the predictor must be first transmitted and decoded. From a compression perspective, the technical challenge is: how to achieve coding gain via inter-coding of MDUs, while enabling adequate random access for satisfactory user navigation. To address this problem, we propose landmarks, a selection of key MDUs from the high-dimensional media. Using landmarks as predictors, nearby MDUs in local neighborhoods are intercoded, resulting in a predictive MDU structure with controlled coding cost. It means that any requested MDU can be decoded by at most transmitting a landmark and an inter-coded MDU, enabling navigational random access. To build a landmarked MDU structure, we employ tree-structured vector quantizer (TSVQ) to first optimize landmark locations, then iteratively add/remove inter-coded MDUs as refinements using a fast branch-and-bound technique. Taking interactive LF images and viewport adaptive 360 images as illustrative applications, and I-, P- and previously proposed merge frames to intra- and inter-code MDUs, we show experimentally that landmarked MDU structures can noticeably reduce the expected transmission cost compared with MDU structures without landmarks.

MMMar 25, 2021
MCTSteg: A Monte Carlo Tree Search-based Reinforcement Learning Framework for Universal Non-additive Steganography

Xianbo Mo, Shunquan Tan, Bin Li et al.

Recent research has shown that non-additive image steganographic frameworks effectively improve security performance through adjusting distortion distribution. However, as far as we know, all of the existing non-additive proposals are based on handcrafted policies, and can only be applied to a specific image domain, which heavily prevent non-additive steganography from releasing its full potentiality. In this paper, we propose an automatic non-additive steganographic distortion learning framework called MCTSteg to remove the above restrictions. Guided by the reinforcement learning paradigm, we combine Monte Carlo Tree Search (MCTS) and steganalyzer-based environmental model to build MCTSteg. MCTS makes sequential decisions to adjust distortion distribution without human intervention. Our proposed environmental model is used to obtain feedbacks from each decision. Due to its self-learning characteristic and domain-independent reward function, MCTSteg has become the first reported universal non-additive steganographic framework which can work in both spatial and JPEG domains. Extensive experimental results show that MCTSteg can effectively withstand the detection of both hand-crafted feature-based and deep-learning-based steganalyzers. In both spatial and JPEG domains, the security performance of MCTSteg steadily outperforms the state of the art by a clear margin under different scenarios.

MMFeb 1, 2021
Deep Learning-based Forgery Attack on Document Images

Lin Zhao, Changsheng Chen, Jiwu Huang

With the ongoing popularization of online services, the digital document images have been used in various applications. Meanwhile, there have emerged some deep learning-based text editing algorithms which alter the textual information of an image . In this work, we present a document forgery algorithm to edit practical document images. To achieve this goal, the limitations of existing text editing algorithms towards complicated characters and complex background are addressed by a set of network design strategies. First, the unnecessary confusion in the supervision data is avoided by disentangling the textual and background information in the source images. Second, to capture the structure of some complicated components, the text skeleton is provided as auxiliary information and the continuity in texture is considered explicitly in the loss function. Third, the forgery traces induced by the text editing operation are mitigated by some post-processing operations which consider the distortions from the print-and-scan channel. Quantitative comparisons of the proposed method and the exiting approach have shown the advantages of our design by reducing the about 2/3 reconstruction error measured in MSE, improving reconstruction quality measured in PSNR and in SSIM by 4 dB and 0.21, respectively. Qualitative experiments have confirmed that the reconstruction results of the proposed method are visually better than the existing approach. More importantly, we have demonstrated the performance of the proposed document forgery algorithm under a practical scenario where an attacker is able to alter the textual information in an identity document using only one sample in the target domain. The forged-and-recaptured samples created by the proposed text editing attack and recapturing operation have successfully fooled some existing document authentication systems.

CVJan 13, 2021
Image Steganography based on Iteratively Adversarial Samples of A Synchronized-directions Sub-image

Xinghong Qin, Shunquan Tan, Bin Li et al.

Nowadays a steganography has to face challenges of both feature based staganalysis and convolutional neural network (CNN) based steganalysis. In this paper, we present a novel steganography scheme denoted as ITE-SYN (based on ITEratively adversarial perturbations onto a SYNchronized-directions sub-image), by which security data is embedded with synchronizing modification directions to enhance security and then iteratively increased perturbations are added onto a sub-image to reduce loss with cover class label of the target CNN classifier. Firstly an exist steganographic function is employed to compute initial costs. Then the cover image is decomposed into some non-overlapped sub-images. After each sub-image is embedded, costs will be adjusted following clustering modification directions profile. And then the next sub-image will be embedded with adjusted costs until all secret data has been embedded. If the target CNN classifier does not discriminate the stego image as a cover image, based on adjusted costs, we change costs with adversarial manners according to signs of gradients back-propagated from the CNN classifier. And then a sub-image is chosen to be re-embedded with changed costs. Adversarial intensity will be iteratively increased until the adversarial stego image can fool the target CNN classifier. Experiments demonstrate that the proposed method effectively enhances security to counter both conventional feature-based classifiers and CNN classifiers, even other non-target CNN classifiers.

MMJan 5, 2021
Domain Generalization for Document Authentication against Practical Recapturing Attacks

Changsheng Chen, Shuzheng Zhang, Fengbo Lan et al.

Recapturing attack can be employed as a simple but effective anti-forensic tool for digital document images. Inspired by the document inspection process that compares a questioned document against a reference sample, we proposed a document recapture detection scheme by employing Siamese network to compare and extract distinct features in a recapture document image. The proposed algorithm takes advantages of both metric learning and image forensic techniques. Instead of adopting Euclidean distance-based loss function, we integrate the forensic similarity function with a triplet loss and a normalized softmax loss. After training with the proposed triplet selection strategy, the resulting feature embedding clusters the genuine samples near the reference while pushes the recaptured samples apart. In the experiment, we consider practical domain generalization problems, such as the variations in printing/imaging devices, substrates, recapturing channels, and document types. To evaluate the robustness of different approaches, we benchmark some popular off-the-shelf machine learning-based approaches, a state-of-the-art document image detection scheme, and the proposed schemes with different network backbones under various experimental protocols. Experimental results show that the proposed schemes with different network backbones have consistently outperformed the state-of-the-art approaches under different experimental settings. Specifically, under the most challenging scenario in our experiment, i.e., evaluation across different types of documents that produced by different devices, we have achieved less than 5.00% APCER (Attack Presentation Classification Error Rate) and 5.56% BPCER (Bona Fide Presentation Classification Error Rate) by the proposed network with ResNeXt101 backbone at 5% BPCER decision threshold.

MMDec 9, 2019
Universal Stego Post-processing for Enhancing Image Steganography

Bolin Chen, Weiqi Luo, Peijia Zheng et al.

It is well known that the designing or improving embedding cost becomes a key issue for current steganographic methods. Unlike existing works, we propose a novel framework to enhance the steganography security via post-processing on the embedding units (i.e., pixel values and DCT coefficients) of stego directly. In this paper, we firstly analyze the characteristics of STCs (Syndrome-Trellis Codes), and then design the rule for post-processing to ensure the correct extraction of hidden message. Since the steganography artifacts are typically reflected on image residuals, we try to reduce the residual distance between cover and the modified stego in order to enhance steganography security. To this end, we model the post-processing as a non-linear integer programming, and implement it via heuristic search. In addition, we carefully determine several important issues in the proposed post-processing, such as the candidate embedding units to be modified, the direction and amplitude of post-modification, the adaptive filters for getting residuals, and the distance measure of residuals. Extensive experimental results evaluated on both hand-crafted steganalytic features and deep learning based ones demonstrate that the proposed method can effectively enhance the security of most modern steganographic methods both in spatial and JPEG domains.

MMNov 12, 2019
CALPA-NET: Channel-pruning-assisted Deep Residual Network for Steganalysis of Digital Images

Shunquan Tan, Weilong Wu, Zilong Shao et al.

Over the past few years, detection performance improvements of deep-learning based steganalyzers have been usually achieved through structure expansion. However, excessive expanded structure results in huge computational cost, storage overheads, and consequently difficulty in training and deployment. In this paper we propose CALPA-NET, a ChAnneL-Pruning-Assisted deep residual network architecture search approach to shrink the network structure of existing vast, over-parameterized deep-learning based steganalyzers. We observe that the broad inverted-pyramid structure of existing deep-learning based steganalyzers might contradict the well-established model diversity oriented philosophy, and therefore is not suitable for steganalysis. Then a hybrid criterion combined with two network pruning schemes is introduced to adaptively shrink every involved convolutional layer in a data-driven manner. The resulting network architecture presents a slender bottleneck-like structure. We have conducted extensive experiments on BOSSBase+BOWS2 dataset, more diverse ALASKA dataset and even a large-scale subset extracted from ImageNet CLS-LOC dataset. The experimental results show that the model structure generated by our proposed CALPA-NET can achieve comparative performance with less than two percent of parameters and about one third FLOPs compared to the original steganalytic model. The new model possesses even better adaptivity, transferability, and scalability.

MMSep 17, 2019
Enhancing JPEG Steganography using Iterative Adversarial Examples

Huaxiao Mo, Tingting Song, Bolin Chen et al.

Convolutional Neural Networks (CNN) based methods have significantly improved the performance of image steganalysis compared with conventional ones based on hand-crafted features. However, many existing literatures on computer vision have pointed out that those effective CNN-based methods can be easily fooled by adversarial examples. In this paper, we propose a novel steganography framework based on adversarial example in an iterative manner. The proposed framework first starts from an existing embedding cost, such as J-UNIWARD in this work, and then updates the cost iteratively based on adversarial examples derived from a series of steganalytic networks until achieving satisfactory results. We carefully analyze two important factors that would affect the security performance of the proposed framework, i.e. the percentage of selected gradients with larger amplitude and the adversarial intensity to modify embedding cost. The experimental results evaluated on three modern steganalytic models, including GFR, SCA-GFR and SRNet, show that the proposed framework is very promising to enhance the security performances of JPEG steganography.

MMAug 6, 2019
New Design Paradigm of Distortion Cost Function for Efficient JPEG Steganography

Wenkang Su, Jiangqun Ni, Xianglei Hu et al.

Recently, with the introduction of JPEG phase-aware steganalysis features, e.g., GFR, the design of JPEG steganographic distortion cost function turns to maintain not only the statistical undetectability in DCT domain but also in spatial domain. To tackle this issue, this paper presents a novel paradigm for the design of JPEG steganographic distortion cost function, which calculates the distortion cost via a generalized Distortion Cost Domain Transformation (DCDT) function. The proposed function comprises the decompressed pixel block embedding changes and their corresponding embedding distortion costs for unit change, where the pixel embedding distortion costs are represented in a more general exponential model, aiming to flexibly allocate the embedding data. In this way, the JPEG steganography could be formulated as the optimization problem of minimizing the overall distortion cost in its decompressed spatial domain, which is equivalent to maximizing its statistical undetectability against JPEG phase-aware steganalysis features. Experimental results show that the proposed DCDT equipped with HiLL (a spatial steganographic distortion cost function) is superior to other state-of-the-art JPEG steganographic schemes, e.g., UERD, J-UNIWARD, and GUED in resisting the detection of JPEG phase-aware feature-based steganalyzers GFR and SCA-GFR, and rivals BET-HiLL with one order of magnitude lower computational complexity, along with the possibility of being further improved by considering the mutually dependent embedding interactions. In addition, the proposed DCDT is also verified to be effective for different image databases and quality factors.

MMAug 22, 2018
Identification of Deep Network Generated Images Using Disparities in Color Components

Haodong Li, Bin Li, Shunquan Tan et al.

With the powerful deep network architectures, such as generative adversarial networks, one can easily generate photorealistic images. Although the generated images are not dedicated for fooling human or deceiving biometric authentication systems, research communities and public media have shown great concerns on the security issues caused by these images. This paper addresses the problem of identifying deep network generated (DNG) images. Taking the differences between camera imaging and DNG image generation into considerations, we analyze the disparities between DNG images and real images in different color components. We observe that the DNG images are more distinguishable from real ones in the chrominance components, especially in the residual domain. Based on these observations, we propose a feature set to capture color image statistics for identifying DNG images. Additionally, we evaluate several detection situations, including the training-testing data are matched or mismatched in image sources or generative models and detection with only real images. Extensive experimental results show that the proposed method can accurately identify DNG images and outperforms existing methods when the training and testing data are mismatched. Moreover, when the GAN model is unknown, our methods also achieves good performance with one-class classification by using only real images for training.

MMMar 24, 2018
CNN Based Adversarial Embedding with Minimum Alteration for Image Steganography

Weixuan Tang, Bin Li, Shunquan Tan et al.

Historically, steganographic schemes were designed in a way to preserve image statistics or steganalytic features. Since most of the state-of-the-art steganalytic methods employ a machine learning (ML) based classifier, it is reasonable to consider countering steganalysis by trying to fool the ML classifiers. However, simply applying perturbations on stego images as adversarial examples may lead to the failure of data extraction and introduce unexpected artefacts detectable by other classifiers. In this paper, we present a steganographic scheme with a novel operation called adversarial embedding, which achieves the goal of hiding a stego message while at the same time fooling a convolutional neural network (CNN) based steganalyzer. The proposed method works under the conventional framework of distortion minimization. Adversarial embedding is achieved by adjusting the costs of image element modifications according to the gradients backpropagated from the CNN classifier targeted by the attack. Therefore, modification direction has a higher probability to be the same as the sign of the gradient. In this way, the so called adversarial stego images are generated. Experiments demonstrate that the proposed steganographic scheme is secure against the targeted adversary-unaware steganalyzer. In addition, it deteriorates the performance of other adversary-aware steganalyzers opening the way to a new class of modern steganographic schemes capable to overcome powerful CNN-based steganalysis.

MMMar 13, 2018
WISERNet: Wider Separate-then-reunion Network for Steganalysis of Color Images

Jishen Zeng, Shunquan Tan, Guangqing Liu et al.

Until recently, deep steganalyzers in spatial domain have been all designed for gray-scale images. In this paper, we propose WISERNet (the wider separate-then-reunion network) for steganalysis of color images. We provide theoretical rationale to claim that the summation in normal convolution is one sort of "linear collusion attack" which reserves strong correlated patterns while impairs uncorrelated noises. Therefore in the bottom convolutional layer which aims at suppressing correlated image contents, we adopt separate channel-wise convolution without summation instead. Conversely, in the upper convolutional layers we believe that the summation in normal convolution is beneficial. Therefore we adopt united normal convolution in those layers and make them remarkably wider to reinforce the effect of "linear collusion attack". As a result, our proposed wide-and-shallow, separate-then-reunion network structure is specifically suitable for color image steganalysis. We have conducted extensive experiments on color image datasets generated from BOSSBase raw images and another large-scale dataset which contains 100,000 raw images, with different demosaicking algorithms and down-sampling algorithms. The experimental results show that our proposed network outperforms other state-of-the-art color image steganalytic models either hand-crafted or learned using deep networks in the literature by a clear margin. Specifically, it is noted that the detection performance gain is achieved with less than half the complexity compared to the most advanced deep-learning steganalyzer as far as we know, which is scarce in the literature.

MMJul 25, 2017
Anti-Forensics of Camera Identification and the Triangle Test by Improved Fingerprint-Copy Attack

Haodong Li, Weiqi Luo, Quanquan Rao et al.

The fingerprint-copy attack aims to confuse camera identification based on sensor pattern noise. However, the triangle test shows that the forged images undergone fingerprint-copy attack would share a non-PRNU (Photo-response nonuniformity) component with every stolen image, and thus can detect fingerprint-copy attack. In this paper, we propose an improved fingerprint-copy attack scheme. Our main idea is to superimpose the estimated fingerprint into the target image dispersedly, via employing a block-wise method and using the stolen images randomly and partly. We also develop a practical method to determine the strength of the superimposed fingerprint based on objective image quality. In such a way, the impact of non-PRNU component on the triangle test is reduced, and our improved fingerprint-copy attack is difficultly detected. The experiments evaluated on 2,900 images from 4 cameras show that our scheme can effectively fool camera identification, and significantly degrade the performance of the triangle test simultaneously.

MMJan 5, 2017
VideoSet: A Large-Scale Compressed Video Quality Dataset Based on JND Measurement

Haiqiang Wang, Ioannis Katsavounidis, Jiantong Zhou et al.

A new methodology to measure coded image/video quality using the just-noticeable-difference (JND) idea was proposed. Several small JND-based image/video quality datasets were released by the Media Communications Lab at the University of Southern California. In this work, we present an effort to build a large-scale JND-based coded video quality dataset. The dataset consists of 220 5-second sequences in four resolutions (i.e., $1920 \times 1080$, $1280 \times 720$, $960 \times 540$ and $640 \times 360$). For each of the 880 video clips, we encode it using the H.264 codec with $QP=1, \cdots, 51$ and measure the first three JND points with 30+ subjects. The dataset is called the "VideoSet", which is an acronym for "Video Subject Evaluation Test (SET)". This work describes the subjective test procedure, detection and removal of outlying measured data, and the properties of collected JND data. Finally, the significance and implications of the VideoSet to future video coding research and standardization efforts are pointed out. All source/coded video clips as well as measured JND data included in the VideoSet are available to the public in the IEEE DataPort.

MMNov 10, 2016
Large-scale JPEG steganalysis using hybrid deep-learning framework

Jishen Zeng, Shunquan Tan, Bin Li et al.

Adoption of deep learning in image steganalysis is still in its initial stage. In this paper we propose a generic hybrid deep-learning framework for JPEG steganalysis incorporating the domain knowledge behind rich steganalytic models. Our proposed framework involves two main stages. The first stage is hand-crafted, corresponding to the convolution phase and the quantization & truncation phase of the rich models. The second stage is a compound deep neural network containing multiple deep subnets in which the model parameters are learned in the training procedure. We provided experimental evidences and theoretical reflections to argue that the introduction of threshold quantizers, though disable the gradient-descent-based learning of the bottom convolution phase, is indeed cost-effective. We have conducted extensive experiments on a large-scale dataset extracted from ImageNet. The primary dataset used in our experiments contains 500,000 cover images, while our largest dataset contains five million cover images. Our experiments show that the integration of quantization and truncation into deep-learning steganalyzers do boost the detection performance by a clear margin. Furthermore, we demonstrate that our framework is insensitive to JPEG blocking artifact alterations, and the learned model can be easily transferred to a different attacking target and even a different dataset. These properties are of critical importance in practical applications.

MMMar 16, 2015
Identification of Image Operations Based on Steganalytic Features

Haodong Li, Weiqi Luo, Xiaoqing Qiu et al.

Image forensics have attracted wide attention during the past decade. Though many forensic methods have been proposed to identify image forgeries, most of them are targeted ones, since their proposed features are highly dependent on the image operation under investigation. The performance of the well-designed features for detecting the targeted operation usually degrades significantly for other operations. On the other hand, a wise attacker can perform anti-forensics to fool the existing forensic methods, making countering anti-forensics become an urgent need. In this paper, we try to find a universal feature to detect various image processing and anti-forensic operations. Based on our extensive experiments and analysis, we find that any image processing/anti-forensic operations would inevitably modify many image pixels. This would change some inherent statistics within original images, which is similar to the case of steganography. Therefore, we model image processing/anti-forensic operations as steganography problems, and propose a detection strategy by applying steganalytic features. With some advanced steganalytic features, we are able to detect various image operations and further identify their types. In our experiments, we have tested several steganalytic features on 11 different kinds of typical image processing operations and 4 kinds of anti-forensic operations. The experimental results have shown that the proposed strategy significantly outperforms the existing forensic methods in both effectiveness and universality.

MMMay 29, 2014
JPEG Noises beyond the First Compression Cycle

Bin Li, Tian-Tsong Ng, Xiaolong Li et al.

This paper focuses on the JPEG noises, which include the quantization noise and the rounding noise, during a JPEG compression cycle. The JPEG noises in the first compression cycle have been well studied; however, so far less attention has been paid on the JPEG noises in higher compression cycles. In this work, we present a statistical analysis on JPEG noises beyond the first compression cycle. To our knowledge, this is the first work on this topic. We find that the noise distributions in higher compression cycles are different from those in the first compression cycle, and they are dependent on the quantization parameters used between two successive cycles. To demonstrate the benefits from the statistical analysis, we provide two applications that can employ the derived noise distributions to uncover JPEG compression history with state-of-the-art performance.