Subsequent embedding in targeted image steganalysis: Theoretical framework and practical applications
This work addresses the problem of data mismatch and uncertainty in steganalysis for security applications, representing an incremental advancement by providing a theoretical basis for an existing strategy.
The paper tackled the challenge of applying machine-learning steganalysis to new data sources by introducing a theoretical framework for subsequent embedding, based on the 'directionality' property of features, and demonstrated practical applications that improve detection in realistic scenarios.
Steganalysis is a collection of techniques used to detect whether secret information is embedded in a carrier using steganography. Most of the existing steganalytic methods are based on machine learning, which typically requires training a classifier with "laboratory" data. However, applying machine-learning classification to a new source of data is challenging, since there is typically a mismatch between the training and the testing sets. In addition, other sources of uncertainty affect the steganlytic process, including the mismatch between the targeted and the true steganographic algorithms, unknown parameters -- such as the message length -- and even having a mixture of several algorithms and parameters, which would constitute a realistic scenario. This paper presents subsequent embedding as a valuable strategy that can be incorporated into modern steganalysis. Although this solution has been applied in previous works, a theoretical basis for this strategy was missing. Here, we cover this research gap by introducing the "directionality" property of features with respect to data embedding. Once this strategy is sustained by a consistent theoretical framework, new practical applications are also described and tested against standard steganography, moving steganalysis closer to real-world conditions.