LG AI CR CVMar 9, 2023

Mark My Words: Dangers of Watermarked Images in ImageNet

Kirill Bykov, Klaus-Robert Müller, Marina M. -C. Höhne

arXiv:2303.05498v17.77 citationsh-index: 19

Originality Incremental advance

AI Analysis

This addresses a data quality issue in computer vision that can lead to biased models, though it is incremental as it builds on prior findings about watermarks.

The paper investigates how watermarks in ImageNet cause pre-trained networks to learn spurious correlations, revealing that multiple classes like 'monitor' and 'broom' are affected, and proposes a method to mitigate this by ignoring susceptible feature encodings.

The utilization of pre-trained networks, especially those trained on ImageNet, has become a common practice in Computer Vision. However, prior research has indicated that a significant number of images in the ImageNet dataset contain watermarks, making pre-trained networks susceptible to learning artifacts such as watermark patterns within their latent spaces. In this paper, we aim to assess the extent to which popular pre-trained architectures display such behavior and to determine which classes are most affected. Additionally, we examine the impact of watermarks on the extracted features. Contrary to the popular belief that the Chinese logographic watermarks impact the "carton" class only, our analysis reveals that a variety of ImageNet classes, such as "monitor", "broom", "apron" and "safe" rely on spurious correlations. Finally, we propose a simple approach to mitigate this issue in fine-tuned networks by ignoring the encodings from the feature-extractor layer of ImageNet pre-trained networks that are most susceptible to watermark imprints.

View on arXiv PDF

Similar