Residual Relaxation for Multi-view Representation Learning
This work addresses a bottleneck in multi-view learning for computer vision, offering an incremental improvement to handle more diverse augmentations.
The paper tackles the problem that strong data augmentations like rotation harm multi-view representation learning by causing semantic shifts, and proposes a relaxation method that improves performance with existing and stronger augmentations, achieving gains in experiments.
Multi-view methods learn representations by aligning multiple views of the same image and their performance largely depends on the choice of data augmentation. In this paper, we notice that some other useful augmentations, such as image rotation, are harmful for multi-view methods because they cause a semantic shift that is too large to be aligned well. This observation motivates us to relax the exact alignment objective to better cultivate stronger augmentations. Taking image rotation as a case study, we develop a generic approach, Pretext-aware Residual Relaxation (Prelax), that relaxes the exact alignment by allowing an adaptive residual vector between different views and encoding the semantic shift through pretext-aware learning. Extensive experiments on different backbones show that our method can not only improve multi-view methods with existing augmentations, but also benefit from stronger image augmentations like rotation.