Probabilistic Contrastive Learning Recovers the Correct Aleatoric Uncertainty of Ambiguous Inputs
This work addresses the issue of inherent ambiguities in real-world observations, such as blurred images, for applications like image retrieval, but it is incremental as it builds on prior proofs about inverting the data-generating process.
The paper tackles the problem of ambiguous inputs in contrastive learning by extending InfoNCE and encoders to predict latent distributions, proving that these distributions recover the correct posteriors and aleatoric uncertainty up to a rotation, enabling calibrated uncertainty estimates and credible intervals in image retrieval.
Contrastively trained encoders have recently been proven to invert the data-generating process: they encode each input, e.g., an image, into the true latent vector that generated the image (Zimmermann et al., 2021). However, real-world observations often have inherent ambiguities. For instance, images may be blurred or only show a 2D view of a 3D object, so multiple latents could have generated them. This makes the true posterior for the latent vector probabilistic with heteroscedastic uncertainty. In this setup, we extend the common InfoNCE objective and encoders to predict latent distributions instead of points. We prove that these distributions recover the correct posteriors of the data-generating process, including its level of aleatoric uncertainty, up to a rotation of the latent space. In addition to providing calibrated uncertainty estimates, these posteriors allow the computation of credible intervals in image retrieval. They comprise images with the same latent as a given query, subject to its uncertainty. Code is available at https://github.com/mkirchhof/Probabilistic_Contrastive_Learning