CVMar 21Code
SWIFT: Sliding Window Reconstruction for Few-Shot Training-Free Generated Video AttributionChao Wang, Zijin Yang, Yaofei Wang et al.
Recent advancements in video generation technologies have been significant, resulting in their widespread application across multiple domains. However, concerns have been mounting over the potential misuse of generated content. Tracing the origin of generated videos has become crucial to mitigate potential misuse and identify responsible parties. Existing video attribution methods require additional operations or the training of source attribution models, which may degrade video quality or necessitate large amounts of training samples. To address these challenges, we define for the first time the "few-shot training-free generated video attribution" task and propose SWIFT, which is tightly integrated with the temporal characteristics of the video. By leveraging the "Pixel Frames(many) to Latent Frame(one)" temporal mapping within each video chunk, SWIFT applies a fixed-length sliding window to perform two distinct reconstructions: normal and corrupted. The variation in the losses between two reconstructions is then used as an attribution signal. We conducted an extensive evaluation of five state-of-the-art (SOTA) video generation models. Experimental results show that SWIFT achieves over 90% average attribution accuracy with merely 20 video samples across all models and even enables zero-shot attribution for HunyuanVideo, EasyAnimate, and Wan2.2. Our source code is available at https://github.com/wangchao0708/SWIFT.
CRApr 28
ReTokSync: Self-Synchronizing Tokenization Disambiguation for Generative Linguistic SteganographyYaofei Wang, Rui Wang, Weilong Pang et al.
Generative linguistic steganography (GLS) enables covert communication by embedding secret messages into the natural language generation process. In practical deployment, however, GLS is vulnerable to tokenization ambiguity: the same surface text may be re-tokenized into a different token sequence at the receiver, breaking the shared decoding state between the communicating parties so that a single local mismatch can propagate into complete extraction failure. Existing solutions either remove ambiguous tokens -- distorting the generation distribution and compromising security -- or preserve the distribution at the cost of substantially reduced embedding capacity or prohibitive runtime overhead. To address this issue, we propose ReTokSync (Re-Tokenization Synchronization), a self-synchronizing disambiguation framework that monitors the receiver-view tokenization during generation and triggers a corrective reset only when ambiguity actually occurs. By confining the effect of tokenization ambiguity to sparse residual bit errors rather than global desynchronization, ReTokSync leaves ambiguity-free positions entirely untouched and remains compatible with the underlying steganographic algorithm. Experiments on both English and Chinese settings show that ReTokSync stays closest to the steganographic baseline in distributional security (zero KL divergence), text quality, embedding capacity, and runtime, while achieving extraction accuracy above 99.7\%. Building on this property, we further develop a two-channel covert communication mechanism in which ReTokSync serves as the primary channel and a reliable auxiliary channel corrects the remaining errors, achieving 100\% end-to-end recovery across all evaluated configurations.
CVMar 9, 2022
Image Steganography based on Style TransferDonghui Hu, Yu Zhang, Cong Yu et al.
Image steganography is the art and science of using images as cover for covert communications. With the development of neural networks, traditional image steganography is more likely to be detected by deep learning-based steganalysis. To improve upon this, we propose image steganography network based on style transfer, and the embedding of secret messages can be disguised as image stylization. We embed secret information while transforming the content image style. In latent space, the secret information is integrated into the latent representation of the cover image to generate the stego images, which are indistinguishable from normal stylized images. It is an end-to-end unsupervised model without pre-training. Extensive experiments on the benchmark dataset demonstrate the reliability, quality and security of stego images generated by our steganographic network.
CVJul 25, 2025
AEDR: Training-Free AI-Generated Image Attribution via Autoencoder Double-ReconstructionChao Wang, Kejiang Chen, Zijin Yang et al.
The rapid advancement of image-generation technologies has made it possible for anyone to create photorealistic images using generative models, raising significant security concerns. To mitigate malicious use, tracing the origin of such images is essential. Reconstruction-based attribution methods offer a promising solution, but they often suffer from reduced accuracy and high computational costs when applied to state-of-the-art (SOTA) models. To address these challenges, we propose AEDR (AutoEncoder Double-Reconstruction), a novel training-free attribution method designed for generative models with continuous autoencoders. Unlike existing reconstruction-based approaches that rely on the value of a single reconstruction loss, AEDR performs two consecutive reconstructions using the model's autoencoder, and adopts the ratio of these two reconstruction losses as the attribution signal. This signal is further calibrated using the image homogeneity metric to improve accuracy, which inherently cancels out absolute biases caused by image complexity, with autoencoder-based reconstruction ensuring superior computational efficiency. Experiments on eight top latent diffusion models show that AEDR achieves 25.5% higher attribution accuracy than existing reconstruction-based methods, while requiring only 1% of the computational time.
CRFeb 12, 2025
Modification and Generated-Text Detection: Achieving Dual Detection Capabilities for the Outputs of LLM by WatermarkYuhang Cai, Yaofei Wang, Donghui Hu et al.
The development of large language models (LLMs) has raised concerns about potential misuse. One practical solution is to embed a watermark in the text, allowing ownership verification through watermark extraction. Existing methods primarily focus on defending against modification attacks, often neglecting other spoofing attacks. For example, attackers can alter the watermarked text to produce harmful content without compromising the presence of the watermark, which could lead to false attribution of this malicious content to the LLM. This situation poses a serious threat to the LLMs service providers and highlights the significance of achieving modification detection and generated-text detection simultaneously. Therefore, we propose a technique to detect modifications in text for unbiased watermark which is sensitive to modification. We introduce a new metric called ``discarded tokens", which measures the number of tokens not included in watermark detection. When a modification occurs, this metric changes and can serve as evidence of the modification. Additionally, we improve the watermark detection process and introduce a novel method for unbiased watermark. Our experiments demonstrate that we can achieve effective dual detection capabilities: modification detection and generated-text detection by watermark.
CROct 17, 2021
Improving Dither Modulation based Robust Steganography by Overflow SuppressionKai Zeng, Kejiang Chen, Yaofei Wang et al.
Nowadays, people are sharing their pictures on online social networks (OSNs), so OSN is a good platform for Steganography. But OSNs usually perform JPEG compression on the uploaded image, which will invalidate most of the existing steganography algorithms. Recently, some works try to design robust steganography which can resist JPEG compression, such as Dither Modulation-based robust Adaptive Steganography (DMAS) and Generalized dither Modulation-based robust Adaptive Steganography (GMAS). They relieve the problem that the receivers cannot extract the message correctly when the quality factor of channel JPEG compression is larger than that of cover images. However, they only can realize limited resistance to detection and compression due to robust domain selection. To overcome this problem, we meticulously explore three lossy operations in the JPEG recompression and discover that the key problem is spatial overflow. Then two preprocessing methods Overall Scaling (OS) and Specific Truncation (ST) are presented to remove overflow before message embedding as well as generate a reference image. The reference image is employed as the guidance to build asymmetric distortion for removing overflow during embedding. Experimental results show that the proposed methods significantly surpass GMAS in terms of security and achieve comparable robustness.