Cross-Architectural Positive Pairs improve the effectiveness of Self-Supervised Learning
This addresses efficiency and robustness issues in self-supervised learning for researchers and practitioners, offering a novel hybrid method that is not purely incremental.
The paper tackles the problem of high computational requirements and performance drops in self-supervised learning by introducing CASS, which uses Transformer and CNN simultaneously, resulting in average accuracy gains of 3.8% to 10.13% across datasets with 69% less training time.
Existing self-supervised techniques have extreme computational requirements and suffer a substantial drop in performance with a reduction in batch size or pretraining epochs. This paper presents Cross Architectural - Self Supervision (CASS), a novel self-supervised learning approach that leverages Transformer and CNN simultaneously. Compared to the existing state-of-the-art self-supervised learning approaches, we empirically show that CASS-trained CNNs and Transformers across four diverse datasets gained an average of 3.8% with 1% labeled data, 5.9% with 10% labeled data, and 10.13% with 100% labeled data while taking 69% less time. We also show that CASS is much more robust to changes in batch size and training epochs than existing state-of-the-art self-supervised learning approaches. We have open-sourced our code at https://github.com/pranavsinghps1/CASS.