Xiaoxiao Hu

h-index8

5papers

161citations

Novelty51%

AI Score30

Ranked #136,704 of 194,257 authors (top 70%)#45,046 in CV (top 76%)

5 Papers

16.7CVJun 12, 2022Code

Bootstrapping Multi-view Representations for Fake News Detection

Qichao Ying, Xiaoxiao Hu, Yangming Zhou et al.

Previous researches on multimedia fake news detection include a series of complex feature extraction and fusion networks to gather useful information from the news. However, how cross-modal consistency relates to the fidelity of news and how features from different modalities affect the decision-making are still open questions. This paper presents a novel scheme of Bootstrapping Multi-view Representations (BMR) for fake news detection. Given a multi-modal news, we extract representations respectively from the views of the text, the image pattern and the image semantics. Improved Multi-gate Mixture-of-Expert networks (iMMoE) are proposed for feature refinement and fusion. Representations from each view are separately used to coarsely predict the fidelity of the whole news, and the multimodal representations are able to predict the cross-modal consistency. With the prediction scores, we reweigh each view of the representations and bootstrap them for fake news detection. Extensive experiments conducted on typical fake news detection datasets prove that the proposed BMR outperforms state-of-the-art schemes.

10.4CVJul 31, 2023

DRAW: Defending Camera-shooted RAW against Image Manipulation

Xiaoxiao Hu, Qichao Ying, Zhenxing Qian et al.

RAW files are the initial measurement of scene radiance widely used in most cameras, and the ubiquitously-used RGB images are converted from RAW data through Image Signal Processing (ISP) pipelines. Nowadays, digital images are risky of being nefariously manipulated. Inspired by the fact that innate immunity is the first line of body defense, we propose DRAW, a novel scheme of defending images against manipulation by protecting their sources, i.e., camera-shooted RAWs. Specifically, we design a lightweight Multi-frequency Partial Fusion Network (MPF-Net) friendly to devices with limited computing resources by frequency learning and partial feature fusion. It introduces invisible watermarks as protective signal into the RAW data. The protection capability can not only be transferred into the rendered RGB images regardless of the applied ISP pipeline, but also is resilient to post-processing operations such as blurring or compression. Once the image is manipulated, we can accurately identify the forged areas with a localization network. Extensive experiments on several famous RAW datasets, e.g., RAISE, FiveK and SIDD, indicate the effectiveness of our method. We hope that this technique can be used in future cameras as an option for image protection, which could effectively restrict image manipulation at the source.

3.7CVSep 5, 2024

ScreenMark: Watermarking Arbitrary Visual Content on Screen

Xiujian Liang, Gaozhi Liu, Yichao Si et al.

Digital watermarking has shown its effectiveness in protecting multimedia content. However, existing watermarking is predominantly tailored for specific media types, rendering them less effective for the protection of content displayed on computer screens, which is often multi-modal and dynamic. Visual Screen Content (VSC), is particularly susceptible to theft and leakage through screenshots, a vulnerability that current watermarking methods fail to adequately address.To address these challenges, we propose ScreenMark, a robust and practical watermarking method designed specifically for arbitrary VSC protection. ScreenMark utilizes a three-stage progressive watermarking framework. Initially, inspired by diffusion principles, we initialize the mutual transformation between regular watermark information and irregular watermark patterns. Subsequently, these patterns are integrated with screen content using a pre-multiplication alpha blending technique, supported by a pre-trained screen decoder for accurate watermark retrieval. The progressively complex distorter enhances the robustness of the watermark in real-world screenshot scenarios. Finally, the model undergoes fine-tuning guided by a joint-level distorter to ensure optimal performance. To validate the effectiveness of ScreenMark, we compiled a dataset comprising 100,000 screenshots from various devices and resolutions. Extensive experiments on different datasets confirm the superior robustness, imperceptibility, and practical applicability of the method.

6.5CVJan 1, 2024

From Covert Hiding to Visual Editing: Robust Generative Video Steganography

Xueying Mao, Xiaoxiao Hu, Wanli Peng et al.

Traditional video steganography methods are based on modifying the covert space for embedding, whereas we propose an innovative approach that embeds secret message within semantic feature for steganography during the video editing process. Although existing traditional video steganography methods display a certain level of security and embedding capacity, they lack adequate robustness against common distortions in online social networks (OSNs). In this paper, we introduce an end-to-end robust generative video steganography network (RoGVS), which achieves visual editing by modifying semantic feature of videos to embed secret message. We employ face-swapping scenario to showcase the visual editing effects. We first design a secret message embedding module to adaptively hide secret message into the semantic feature of videos. Extensive experiments display that the proposed RoGVS method applied to facial video datasets demonstrate its superiority over existing video and image steganography techniques in terms of both robustness and capacity.

8.0CVOct 12, 2021

RWN: Robust Watermarking Network for Image Cropping Localization

Qichao Ying, Xiaoxiao Hu, Xiangyu Zhang et al.

Image cropping can be maliciously used to manipulate the layout of an image and alter the underlying meaning. Previous image crop detection schemes only predicts whether an image has been cropped, ignoring which part of the image is cropped. This paper presents a novel robust watermarking network (RWN) for image crop localization. We train an anti-crop processor (ACP) that embeds a watermark into a target image. The visually indistinguishable protected image is then posted on the social network instead of the original image. At the recipient's side, ACP extracts the watermark from the attacked image, and we conduct feature matching on the original and extracted watermark to locate the position of the crop in the original image plane. We further extend our scheme to detect tampering attack on the attacked image. Besides, we explore a simple yet efficient method (JPEG-Mixup) to improve the generalization of JPEG robustness. According to our comprehensive experiments, RWN is the first to provide high-accuracy and robust image crop localization. Besides, the accuracy of tamper detection is comparable with many state-of-the-art passive-based methods.