Rongrong Ni

CV
12papers
100citations
Novelty52%
AI Score26

12 Papers

CVJun 12, 2023
NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake Detection

Yu Chen, Yang Yu, Rongrong Ni et al.

Deepfake technologies empowered by deep learning are rapidly evolving, creating new security concerns for society. Existing multimodal detection methods usually capture audio-visual inconsistencies to expose Deepfake videos. More seriously, the advanced Deepfake technology realizes the audio-visual calibration of the critical phoneme-viseme regions, achieving a more realistic tampering effect, which brings new challenges. To address this problem, we propose a novel Deepfake detection method to mine the correlation between Non-critical Phonemes and Visemes, termed NPVForensics. Firstly, we propose the Local Feature Aggregation block with Swin Transformer (LFA-ST) to construct non-critical phoneme-viseme and corresponding facial feature streams effectively. Secondly, we design a loss function for the fine-grained motion of the talking face to measure the evolutionary consistency of non-critical phoneme-viseme. Next, we design a phoneme-viseme awareness module for cross-modal feature fusion and representation alignment, so that the modality gap can be reduced and the intrinsic complementarity of the two modalities can be better explored. Finally, a self-supervised pre-training strategy is leveraged to thoroughly learn the audio-visual correspondences in natural videos. In this manner, our model can be easily adapted to the downstream Deepfake datasets with fine-tuning. Extensive experiments on existing benchmarks demonstrate that the proposed approach outperforms state-of-the-art methods.

IVFeb 2, 2021
Image Splicing Detection, Localization and Attribution via JPEG Primary Quantization Matrix Estimation and Clustering

Yakun Niu, Benedetta Tondi, Yao Zhao et al.

Detection of inconsistencies of double JPEG artefacts across different image regions is often used to detect local image manipulations, like image splicing, and to localize them. In this paper, we move one step further, proposing an end-to-end system that, in addition to detecting and localizing spliced regions, can also distinguish regions coming from different donor images. We assume that both the spliced regions and the background image have undergone a double JPEG compression, and use a local estimate of the primary quantization matrix to distinguish between spliced regions taken from different sources. To do so, we cluster the image blocks according to the estimated primary quantization matrix and refine the result by means of morphological reconstruction. The proposed method can work in a wide variety of settings including aligned and non-aligned double JPEG compression, and regardless of whether the second compression is stronger or weaker than the first one. We validated the proposed approach by means of extensive experiments showing its superior performance with respect to baseline methods working in similar conditions.

MMJan 26, 2021
Efficient video integrity analysis through container characterization

Pengpeng Yang, Daniele Baracchi, Massimo Iuliani et al.

Most video forensic techniques look for traces within the data stream that are, however, mostly ineffective when dealing with strongly compressed or low resolution videos. Recent research highlighted that useful forensic traces are also left in the video container structure, thus offering the opportunity to understand the life-cycle of a video file without looking at the media stream itself. In this paper we introduce a container-based method to identify the software used to perform a video manipulation and, in most cases, the operating system of the source device. As opposed to the state of the art, the proposed method is both efficient and effective and can also provide a simple explanation for its decisions. This is achieved by using a decision-tree-based classifier applied to a vectorial representation of the video container structure. We conducted an extensive validation on a dataset of 7000 video files including both software manipulated contents (ffmpeg, Exiftool, Adobe Premiere, Avidemux, and Kdenlive), and videos exchanged through social media platforms (Facebook, TikTok, Weibo and YouTube). This dataset has been made available to the research community. The proposed method achieves an accuracy of 97.6% in distinguishing pristine from tampered videos and classifying the editing software, even when the video is cut without re-encoding or when it is downscaled to the size of a thumbnail. Furthermore, it is capable of correctly identifying the operating system of the source device for most of the tampered videos.

CVOct 27, 2020
Mining Generalized Features for Detecting AI-Manipulated Fake Faces

Yang Yu, Rongrong Ni, Yao Zhao

Recently, AI-manipulated face techniques have developed rapidly and constantly, which has raised new security issues in society. Although existing detection methods consider different categories of fake faces, the performance on detecting the fake faces with "unseen" manipulation techniques is still poor due to the distribution bias among cross-manipulation techniques. To solve this problem, we propose a novel framework that focuses on mining intrinsic features and further eliminating the distribution bias to improve the generalization ability. Firstly, we focus on mining the intrinsic clues in the channel difference image (CDI) and spectrum image (SI) from the camera imaging process and the indispensable step in AI manipulation process. Then, we introduce the Octave Convolution (OctConv) and an attention-based fusion module to effectively and adaptively mine intrinsic features from CDI and SI. Finally, we design an alignment module to eliminate the bias of manipulation techniques to obtain a more generalized detection framework. We evaluate the proposed framework on four categories of fake faces datasets with the most popular and state-of-the-art manipulation techniques, and achieve very competitive performances. To further verify the generalization ability of the proposed framework, we conduct experiments on cross-manipulation techniques, and the results show the advantages of our method.

CVMay 12, 2020
Increased-confidence adversarial examples for deep learning counter-forensics

Wenjie Li, Benedetta Tondi, Rongrong Ni et al.

Transferability of adversarial examples is a key issue to apply this kind of attacks against multimedia forensics (MMF) techniques based on Deep Learning (DL) in a real-life setting. Adversarial example transferability, in fact, would open the way to the deployment of successful counter forensics attacks also in cases where the attacker does not have a full knowledge of the to-be-attacked system. Some preliminary works have shown that adversarial examples against CNN-based image forensics detectors are in general non-transferrable, at least when the basic versions of the attacks implemented in the most popular libraries are adopted. In this paper, we introduce a general strategy to increase the strength of the attacks and evaluate their transferability when such a strength varies. We experimentally show that, in this way, attack transferability can be largely increased, at the expense of a larger distortion. Our research confirms the security threats posed by the existence of adversarial examples even in multimedia forensics scenarios, thus calling for new defense strategies to improve the security of DL-based MMF techniques.

MMOct 17, 2019
Dual-Domain Fusion Convolutional Neural Network for Contrast Enhancement Forensics

Pengpeng Yang, Rongrong Ni, Yao Zhao et al.

Contrast enhancement (CE) forensics techniques have always been of great interest for image forensics community, as they can be an effective tool for recovering image history and identifying tampered images. Although several CE forensic algorithms have been proposed, their accuracy and robustness against some kinds of processing are still unsatisfactory. In order to attenuate such deficiency, in this paper we propose a new framework based on dual-domain fusion convolutional neural network to fuse the features of pixel and histogram domains for CE forensics. Specifically, we first present a pixel-domain convolutional neural network (P-CNN) to automatically capture the patterns of contrast-enhanced images in the pixel domain. Then, we present a histogram-domain convolutional neural network (H-CNN) to extract the features in the histogram domain. The feature representations of pixel and histogram domains are fused and fed into two fully connected layers for the classification of contrast-enhanced images. Experimental results show that the proposed method achieve better performance and is robust against pre-JPEG compression and anti-forensics attacks. In addition, a strategy for performance improvement of CNN-based forensics is explored, which could provide guidance for the design of CNN-based forensics tools.

MMJun 5, 2018
Double JPEG Compression Detection by Exploring the Correlations in DCT Domain

Pengpeng Yang, Rongrong Ni, Yao Zhao

In the field of digital image processing, JPEG image compression technique has been widely applied. And numerous image processing software suppose this. It is likely for the images undergoing double JPEG compression to be tampered. Therefore, double JPEG compression detection schemes can provide an important clue for image forgery detection. In this paper, we propose an effective algorithm to detect double JPEG compression with different quality factors. Firstly, the quantized DCT coefficients with same frequency are extracted to build the new data matrices. Then, considering the direction effect on the correlation between the adjacent positions in DCT domain, twelve kinds of high-pass filter templates with different directions are executed and the translation probability matrix is calculated for each filtered data. Furthermore, principal component analysis and support vector machine technique are applied to reduce the feature dimension and train a classifier, respectively. Experimental results have demonstrated that the proposed method is effective and has comparable performance.

CVMar 29, 2018
Security Consideration For Deep Learning-Based Image Forensics

Wei Zhao, Pengpeng Yang, Rongrong Ni et al.

Recently, image forensics community has paied attention to the research on the design of effective algorithms based on deep learning technology and facts proved that combining the domain knowledge of image forensics and deep learning would achieve more robust and better performance than the traditional schemes. Instead of improving it, in this paper, the safety of deep learning based methods in the field of image forensics is taken into account. To the best of our knowledge, this is a first work focusing on this topic. Specifically, we experimentally find that the method using deep learning would fail when adding the slight noise into the images (adversarial images). Furthermore, two kinds of strategys are proposed to enforce security of deep learning-based method. Firstly, an extra penalty term to the loss function is added, which is referred to the 2-norm of the gradient of the loss with respect to the input images, and then an novel training method are adopt to train the model by fusing the normal and adversarial images. Experimental results show that the proposed algorithm can achieve good performance even in the case of adversarial images and provide a safety consideration for deep learning-based image forensics

MMMar 13, 2018
Robust Contrast Enhancement Forensics Using Pixel and Histogram Domain CNNs

Pengpeng Yang, Rongrong Ni, Yao Zhao et al.

Contrast enhancement (CE) forensics has always been ofconcern to image forensics community. It can provide aneffective tool for recovering image history and identifyingtampered images. Although several CE forensic algorithmshave been proposed, their robustness against some processingis still unsatisfactory, such as JPEG compression and anti-forensic attacks. In order to attenuate such deficiency, inthis paper we first present a discriminability analysis of CEforensics in pixel and gray level histogram domains. Then, insuch two domains, two end-to-end methods based on convo-lutional neural networks (P-CNN, H-CNN) are proposed toachieve robust CE forensics against pre-JPEG compressionand anti-forensics attacks. Experimental results show that theproposed methods achieve much better performance than thestate-of-the-art schemes for CE detection in the case of noother operation and comparable performance when pre-JPEGcompression and anti-foresics attacks is used.

IVFeb 20, 2018
Non-Local Graph-Based Prediction For Reversible Data Hiding In Images

Qi Chang, Gene Cheung, Yao Zhao et al.

Reversible data hiding (RDH) is desirable in applications where both the hidden message and the cover medium need to be recovered without loss. Among many RDH approaches is prediction-error expansion (PEE), containing two steps: i) prediction of a target pixel value, and ii) embedding according to the value of prediction-error. In general, higher prediction performance leads to larger embedding capacity and/or lower signal distortion. Leveraging on recent advances in graph signal processing (GSP), we pose pixel prediction as a graph-signal restoration problem, where the appropriate edge weights of the underlying graph are computed using a similar patch searched in a semi-local neighborhood. Specifically, for each candidate patch, we first examine eigenvalues of its structure tensor to estimate its local smoothness. If sufficiently smooth, we pose a maximum a posteriori (MAP) problem using either a quadratic Laplacian regularizer or a graph total variation (GTV) term as signal prior. While the MAP problem using the first prior has a closed-form solution, we design an efficient algorithm for the second prior using alternating direction method of multipliers (ADMM) with nested proximal gradient descent. Experimental results show that with better quality GSP-based prediction, at low capacity the visual quality of the embedded image exceeds state-of-the-art methods noticeably.

CRFeb 2, 2018
Secure Detection of Image Manipulation by means of Random Feature Selection

Zhipeng Chen, Benedetta Tondi, Xiaolong Li et al.

We address the problem of data-driven image manipulation detection in the presence of an attacker with limited knowledge about the detector. Specifically, we assume that the attacker knows the architecture of the detector, the training data and the class of features V the detector can rely on. In order to get an advantage in his race of arms with the attacker, the analyst designs the detector by relying on a subset of features chosen at random in V. Given its ignorance about the exact feature set, the adversary attacks a version of the detector based on the entire feature set. In this way, the effectiveness of the attack diminishes since there is no guarantee that attacking a detector working in the full feature space will result in a successful attack against the reduced-feature detector. We theoretically prove that, thanks to random feature selection, the security of the detector increases significantly at the expense of a negligible loss of performance in the absence of attacks. We also provide an experimental validation of the proposed procedure by focusing on the detection of two specific kinds of image manipulations, namely adaptive histogram equalization and median filtering. The experiments confirm the gain in security at the expense of a negligible loss of performance in the absence of attacks.

CVMar 15, 2017
Source Camera Identification Based On Content-Adaptive Fusion Network

Pengpeng Yang, Wei Zhao, Rongrong Ni et al.

Source camera identification is still a hard task in forensics community, especially for the case of the small query image size. In this paper, we propose a solution to identify the source camera of the small-size images: content-adaptive fusion network. In order to learn better feature representation from the input data, content-adaptive convolutional neural networks(CA-CNN) are constructed. We add a convolutional layer in preprocessing stage. Moreover, with the purpose of capturing more comprehensive information, we parallel three CA-CNNs: CA3-CNN, CA5-CNN, CA7-CNN to get the content-adaptive fusion network. The difference of three CA-CNNs lies in the convolutional kernel size of pre-processing layer. The experimental results show that the proposed method is practicable and satisfactory.