Unsupervised Steganalysis Based on Artificial Training Sets
This addresses the challenge of steganalysis without needing labeled training data, which is incremental as it builds on existing supervised techniques but removes the dependency on cover source mismatch.
The paper tackles the problem of unsupervised steganalysis for detecting hidden messages in images, proposing a method that combines artificial training sets with supervised classification and shows experimental results outperforming previous methods in most tested cases.
In this paper, an unsupervised steganalysis method that combines artificial training setsand supervised classification is proposed. We provide a formal framework for unsupervisedclassification of stego and cover images in the typical situation of targeted steganalysis (i.e.,for a known algorithm and approximate embedding bit rate). We also present a completeset of experiments using 1) eight different image databases, 2) image features based on RichModels, and 3) three different embedding algorithms: Least Significant Bit (LSB) matching,Highly undetectable steganography (HUGO) and Wavelet Obtained Weights (WOW). Weshow that the experimental results outperform previous methods based on Rich Models inthe majority of the tested cases. At the same time, the proposed approach bypasses theproblem of Cover Source Mismatch -when the embedding algorithm and bit rate are known-, since it removes the need of a training database when we have a large enough testing set.Furthermore, we provide a generic proof of the proposed framework in the machine learningcontext. Hence, the results of this paper could be extended to other classification problemssimilar to steganalysis.