How to augment a small learning set for improving the performances of a CNN-based steganalyzer?
This work addresses the challenge of limited training data for steganalysis practitioners, but it is incremental as it builds on prior research to refine database augmentation strategies.
The paper tackles the problem of improving CNN-based steganalysis by studying how to augment a small learning set, finding that careful selection of images based on criteria like camera models and treatments can enhance classification accuracy, with specific experimental protocols showing measurable performance gains.
Deep learning and convolutional neural networks (CNN) have been intensively used in many image processing topics during last years. As far as steganalysis is concerned, the use of CNN allows reaching the state-of-the-art results. The performances of such networks often rely on the size of their learning database. An obvious preliminary assumption could be considering that "the bigger a database is, the better the results are". However, it appears that cautions have to be taken when increasing the database size if one desire to improve the classification accuracy i.e. enhance the steganalysis efficiency. To our knowledge, no study has been performed on the enrichment impact of a learning database on the steganalysis performance. What kind of images can be added to the initial learning set? What are the sensitive criteria: the camera models used for acquiring the images, the treatments applied to the images, the cameras proportions in the database, etc? This article continues the work carried out in a previous paper, and explores the ways to improve the performances of CNN. It aims at studying the effects of "base augmentation" on the performance of steganalysis using a CNN. We present the results of this study using various experimental protocols and various databases to define the good practices in base augmentation for steganalysis.