CVNov 5, 2022

Local Manifold Augmentation for Multiview Semantic Consistency

Yu Yang, Wing Yin Cheung, Chang Liu, Xiangyang Ji

arXiv:2211.02798v14.84 citationsh-index: 17

Originality Incremental advance

AI Analysis

This addresses the need for better data augmentations in self-supervised learning for computer vision, offering a novel method to improve representation learning, though it is incremental as it builds on existing frameworks like MoCov2 and SimSiam.

The paper tackles the problem of limited data augmentations in multiview self-supervised learning by proposing local manifold augmentation (LMA) to simulate complex intra-class variations, resulting in consistent improvements on benchmarks like CIFAR10, CIFAR100, STL10, ImageNet100, and ImageNet, with enhanced invariance to viewpoint, pose, and illumination changes.

Multiview self-supervised representation learning roots in exploring semantic consistency across data of complex intra-class variation. Such variation is not directly accessible and therefore simulated by data augmentations. However, commonly adopted augmentations are handcrafted and limited to simple geometrical and color changes, which are unable to cover the abundant intra-class variation. In this paper, we propose to extract the underlying data variation from datasets and construct a novel augmentation operator, named local manifold augmentation (LMA). LMA is achieved by training an instance-conditioned generator to fit the distribution on the local manifold of data and sampling multiview data using it. LMA shows the ability to create an infinite number of data views, preserve semantics, and simulate complicated variations in object pose, viewpoint, lighting condition, background etc. Experiments show that with LMA integrated, self-supervised learning methods such as MoCov2 and SimSiam gain consistent improvement on prevalent benchmarks including CIFAR10, CIFAR100, STL10, ImageNet100, and ImageNet. Furthermore, LMA leads to representations that obtain more significant invariance to the viewpoint, object pose, and illumination changes and stronger robustness to various real distribution shifts reflected by ImageNet-V2, ImageNet-R, ImageNet Sketch etc.

View on arXiv PDF

Similar