LG CVNov 2, 2022

EquiMod: An Equivariance Module to Improve Self-Supervised Learning

arXiv:2211.01244v219.223 citationsh-index: 9Has Code

Originality Incremental advance

AI Analysis

This work addresses a trade-off in self-supervised learning for computer vision, offering a generic solution to retain beneficial augmentation information, though it is incremental as it builds on existing invariance models.

The paper tackles the problem of self-supervised learning methods losing augmentation-related information by introducing EquiMod, a module that structures the latent space to predict displacement from augmentations, resulting in performance improvements on CIFAR10 and ImageNet datasets.

Self-supervised visual representation methods are closing the gap with supervised learning performance. These methods rely on maximizing the similarity between embeddings of related synthetic inputs created through data augmentations. This can be seen as a task that encourages embeddings to leave out factors modified by these augmentations, i.e. to be invariant to them. However, this only considers one side of the trade-off in the choice of the augmentations: they need to strongly modify the images to avoid simple solution shortcut learning (e.g. using only color histograms), but on the other hand, augmentations-related information may be lacking in the representations for some downstream tasks (e.g. color is important for birds and flower classification). Few recent works proposed to mitigate the problem of using only an invariance task by exploring some form of equivariance to augmentations. This has been performed by learning additional embeddings space(s), where some augmentation(s) cause embeddings to differ, yet in a non-controlled way. In this work, we introduce EquiMod a generic equivariance module that structures the learned latent space, in the sense that our module learns to predict the displacement in the embedding space caused by the augmentations. We show that applying that module to state-of-the-art invariance models, such as SimCLR and BYOL, increases the performances on CIFAR10 and ImageNet datasets. Moreover, while our model could collapse to a trivial equivariance, i.e. invariance, we observe that it instead automatically learns to keep some augmentations-related information beneficial to the representations.

View on arXiv PDF Code

Similar