Xiaoxiao Hu

CV
h-index37
7papers
163citations
Novelty46%
AI Score30

7 Papers

CVJun 12, 2022
Bootstrapping Multi-view Representations for Fake News Detection

Qichao Ying, Xiaoxiao Hu, Yangming Zhou et al.

Previous researches on multimedia fake news detection include a series of complex feature extraction and fusion networks to gather useful information from the news. However, how cross-modal consistency relates to the fidelity of news and how features from different modalities affect the decision-making are still open questions. This paper presents a novel scheme of Bootstrapping Multi-view Representations (BMR) for fake news detection. Given a multi-modal news, we extract representations respectively from the views of the text, the image pattern and the image semantics. Improved Multi-gate Mixture-of-Expert networks (iMMoE) are proposed for feature refinement and fusion. Representations from each view are separately used to coarsely predict the fidelity of the whole news, and the multimodal representations are able to predict the cross-modal consistency. With the prediction scores, we reweigh each view of the representations and bootstrap them for fake news detection. Extensive experiments conducted on typical fake news detection datasets prove that the proposed BMR outperforms state-of-the-art schemes.

CVJul 31, 2023
DRAW: Defending Camera-shooted RAW against Image Manipulation

Xiaoxiao Hu, Qichao Ying, Zhenxing Qian et al.

RAW files are the initial measurement of scene radiance widely used in most cameras, and the ubiquitously-used RGB images are converted from RAW data through Image Signal Processing (ISP) pipelines. Nowadays, digital images are risky of being nefariously manipulated. Inspired by the fact that innate immunity is the first line of body defense, we propose DRAW, a novel scheme of defending images against manipulation by protecting their sources, i.e., camera-shooted RAWs. Specifically, we design a lightweight Multi-frequency Partial Fusion Network (MPF-Net) friendly to devices with limited computing resources by frequency learning and partial feature fusion. It introduces invisible watermarks as protective signal into the RAW data. The protection capability can not only be transferred into the rendered RGB images regardless of the applied ISP pipeline, but also is resilient to post-processing operations such as blurring or compression. Once the image is manipulated, we can accurately identify the forged areas with a localization network. Extensive experiments on several famous RAW datasets, e.g., RAISE, FiveK and SIDD, indicate the effectiveness of our method. We hope that this technique can be used in future cameras as an option for image protection, which could effectively restrict image manipulation at the source.

CVSep 5, 2024
ScreenMark: Watermarking Arbitrary Visual Content on Screen

Xiujian Liang, Gaozhi Liu, Yichao Si et al.

Digital watermarking has shown its effectiveness in protecting multimedia content. However, existing watermarking is predominantly tailored for specific media types, rendering them less effective for the protection of content displayed on computer screens, which is often multi-modal and dynamic. Visual Screen Content (VSC), is particularly susceptible to theft and leakage through screenshots, a vulnerability that current watermarking methods fail to adequately address.To address these challenges, we propose ScreenMark, a robust and practical watermarking method designed specifically for arbitrary VSC protection. ScreenMark utilizes a three-stage progressive watermarking framework. Initially, inspired by diffusion principles, we initialize the mutual transformation between regular watermark information and irregular watermark patterns. Subsequently, these patterns are integrated with screen content using a pre-multiplication alpha blending technique, supported by a pre-trained screen decoder for accurate watermark retrieval. The progressively complex distorter enhances the robustness of the watermark in real-world screenshot scenarios. Finally, the model undergoes fine-tuning guided by a joint-level distorter to ensure optimal performance. To validate the effectiveness of ScreenMark, we compiled a dataset comprising 100,000 screenshots from various devices and resolutions. Extensive experiments on different datasets confirm the superior robustness, imperceptibility, and practical applicability of the method.

IVJun 6, 2022
Image Protection for Robust Cropping Localization and Recovery

Qichao Ying, Hang Zhou, Xiaoxiao Hu et al.

Existing image cropping detection schemes ignore that recovering the cropped-out contents can unveil the purpose of the behaved cropping attack. This paper presents \textbf{CLR}-Net, a novel image protection scheme addressing the combined challenge of image \textbf{C}ropping \textbf{L}ocalization and \textbf{R}ecovery. We first protect the original image by introducing imperceptible perturbations. Then, typical image post-processing attacks are simulated to erode the protected image. On the recipient's side, we predict the cropping mask and recover the original image. Besides, we propose a novel \textbf{F}ine-\textbf{G}rained generative \textbf{JPEG} simulator (FG-JPEG) as well as a feature alignment network to improve the real-world robustness. Comprehensive experiments prove that the quality of the recovered image and the accuracy of crop localization are both satisfactory.

CVJan 1, 2024
From Covert Hiding to Visual Editing: Robust Generative Video Steganography

Xueying Mao, Xiaoxiao Hu, Wanli Peng et al.

Traditional video steganography methods are based on modifying the covert space for embedding, whereas we propose an innovative approach that embeds secret message within semantic feature for steganography during the video editing process. Although existing traditional video steganography methods display a certain level of security and embedding capacity, they lack adequate robustness against common distortions in online social networks (OSNs). In this paper, we introduce an end-to-end robust generative video steganography network (RoGVS), which achieves visual editing by modifying semantic feature of videos to embed secret message. We employ face-swapping scenario to showcase the visual editing effects. We first design a secret message embedding module to adaptively hide secret message into the semantic feature of videos. Extensive experiments display that the proposed RoGVS method applied to facial video datasets demonstrate its superiority over existing video and image steganography techniques in terms of both robustness and capacity.

CVJan 29, 2025
An Exceptional Dataset For Rare Pancreatic Tumor Segmentation

Wenqi Li, Yingli Chen, Keyang Zhou et al.

Pancreatic NEuroendocrine Tumors (pNETs) are very rare endocrine neoplasms that account for less than 5% of all pancreatic malignancies, with an incidence of only 1-1.5 cases per 100,000. Early detection of pNETs is critical for improving patient survival, but the rarity of pNETs makes segmenting them from CT a very challenging problem. So far, there has not been a dataset specifically for pNETs available to researchers. To address this issue, we propose a pNETs dataset, a well-annotated Contrast-Enhanced Computed Tomography (CECT) dataset focused exclusively on Pancreatic Neuroendocrine Tumors, containing data from 469 patients. This is the first dataset solely dedicated to pNETs, distinguishing it from previous collections. Additionally, we provide the baseline detection networks with a new slice-wise weight loss function designed for the UNet-based model, improving the overall pNET segmentation performance. We hope that our dataset can enhance the understanding and diagnosis of pNET Tumors within the medical community, facilitate the development of more accurate diagnostic tools, and ultimately improve patient outcomes and advance the field of oncology.

CVOct 12, 2021
RWN: Robust Watermarking Network for Image Cropping Localization

Qichao Ying, Xiaoxiao Hu, Xiangyu Zhang et al.

Image cropping can be maliciously used to manipulate the layout of an image and alter the underlying meaning. Previous image crop detection schemes only predicts whether an image has been cropped, ignoring which part of the image is cropped. This paper presents a novel robust watermarking network (RWN) for image crop localization. We train an anti-crop processor (ACP) that embeds a watermark into a target image. The visually indistinguishable protected image is then posted on the social network instead of the original image. At the recipient's side, ACP extracts the watermark from the attacked image, and we conduct feature matching on the original and extracted watermark to locate the position of the crop in the original image plane. We further extend our scheme to detect tampering attack on the attacked image. Besides, we explore a simple yet efficient method (JPEG-Mixup) to improve the generalization of JPEG robustness. According to our comprehensive experiments, RWN is the first to provide high-accuracy and robust image crop localization. Besides, the accuracy of tamper detection is comparable with many state-of-the-art passive-based methods.