CV CR LGFeb 21, 2022

A Self-Supervised Descriptor for Image Copy Detection

Ed Pizzi, Sreya Dutta Roy, Sugosh Nagavara Ravindra, Priya Goyal, Matthijs Douze

arXiv:2202.10261v231.8209 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses content moderation challenges for platforms needing scalable detection of copied images, though it appears incremental as it builds on existing self-supervised methods.

The paper tackles image copy detection for content moderation by introducing SSCD, a model that adapts self-supervised contrastive learning with architectural changes and an entropy regularization term, resulting in a 48% absolute improvement over SimCLR descriptors on the DISC2021 benchmark.

Image copy detection is an important task for content moderation. We introduce SSCD, a model that builds on a recent self-supervised contrastive training objective. We adapt this method to the copy detection task by changing the architecture and training objective, including a pooling operator from the instance matching literature, and adapting contrastive learning to augmentations that combine images. Our approach relies on an entropy regularization term, promoting consistent separation between descriptor vectors, and we demonstrate that this significantly improves copy detection accuracy. Our method produces a compact descriptor vector, suitable for real-world web scale applications. Statistical information from a background image distribution can be incorporated into the descriptor. On the recent DISC2021 benchmark, SSCD is shown to outperform both baseline copy detection models and self-supervised architectures designed for image classification by huge margins, in all settings. For example, SSCD out-performs SimCLR descriptors by 48% absolute. Code is available at https://github.com/facebookresearch/sscd-copy-detection.

View on arXiv PDF Code

Similar