Inverse Problems Leveraging Pre-trained Contrastive Representations
This addresses the challenge of robust representation learning for corrupted data in computer vision, though it is incremental as it builds on existing pre-trained models.
The paper tackles the problem of recovering clean image representations from corrupted versions using a pre-trained network like CLIP, achieving higher accuracy than end-to-end baselines in classifying distorted images such as those with blurring, noise, or pixel masking on a subset of ImageNet.
We study a new family of inverse problems for recovering representations of corrupted data. We assume access to a pre-trained representation learning network R(x) that operates on clean images, like CLIP. The problem is to recover the representation of an image R(x), if we are only given a corrupted version A(x), for some known forward operator A. We propose a supervised inversion method that uses a contrastive objective to obtain excellent representations for highly corrupted images. Using a linear probe on our robust representations, we achieve a higher accuracy than end-to-end supervised baselines when classifying images with various types of distortions, including blurring, additive noise, and random pixel masking. We evaluate on a subset of ImageNet and observe that our method is robust to varying levels of distortion. Our method outperforms end-to-end baselines even with a fraction of the labeled data in a wide range of forward operators.