Tackle CSM in JPEG Steganalysis with Data Adaptation

Rony Abecidan, Vincent Itier, Jérémie Boulanger, Patrick Bas, Tomáš Pevný

arXiv:2605.2152323.2Has Code

Predicted impact top 67% in IV · last 90 daysOriginality Incremental advance

AI Analysis

For practitioners in steganalysis who face realistic CSM scenarios with limited unlabeled data and unknown processing, TADA provides a practical solution to improve detection accuracy.

TADA addresses Cover Source Mismatch (CSM) in JPEG steganalysis by learning to emulate unknown processing pipelines from a small unlabeled target set, achieving substantial gains in robustness and operational generalization over baselines.

Steganalysis models excel on benchmark datasets but struggle in the wild when analyzed images are produced by a processing pipeline unseen during training. This problem known as Cover Source Mismatch (CSM) is particularly hard in realistic settings where practitioners (1) have access to only a small, unlabeled dataset, (2) are unsure of the processing techniques applied to these images, and (3) lack information on the proportion of covers and stegos in that set. To answer this challenge, we introduce TADA (Target Alignment through Data Adaptation), a framework learning to emulate the unknown processing pipeline from a small unlabeled target set. This architecture is trained with a loss combining residual covariance alignment, residual distribution matching, and a $\ell^2$ loss constraining the emulator to produce realistic images. Across toy and operational targets, TADA yields substantial gains in robustness to CSM and improves operational generalization compared to strong holistic and atomistic baselines. Additional resources are available at this link: https://github.com/RonyAbecidan/TADA

View on arXiv PDF Code

Similar