CVAIMar 20, 2022

Partitioning Image Representation in Contrastive Learning

arXiv:2203.10454v31 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses a limitation in contrastive learning for image representation, offering a method to better handle data augmentation effects, though it appears incremental as it builds on existing frameworks like VAE and BYOL.

The paper tackles the problem of forcing identical representations for augmented samples in contrastive learning by introducing a partitioned representation with content and style parts to capture common and unique features. It shows improved separability in VAE and outperforms BYOL in linear separability and few-shot learning tasks.

In contrastive learning in the image domain, the anchor and positive samples are forced to have as close representations as possible. However, forcing the two samples to have the same representation could be misleading because the data augmentation techniques make the two samples different. In this paper, we introduce a new representation, partitioned representation, which can learn both common and unique features of the anchor and positive samples in contrastive learning. The partitioned representation consists of two parts: the content part and the style part. The content part represents common features of the class, and the style part represents the own features of each sample, which can lead to the representation of the data augmentation method. We can achieve the partitioned representation simply by decomposing a loss function of contrastive learning into two terms on the two separate representations, respectively. To evaluate our representation with two parts, we take two framework models: Variational AutoEncoder (VAE) and BootstrapYour Own Latent(BYOL) to show the separability of content and style, and to confirm the generalization ability in classification, respectively. Based on the experiments, we show that our approach can separate two types of information in the VAE framework and outperforms the conventional BYOL in linear separability and a few-shot learning task as downstream tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes