LG CVDec 22, 2022

Offline Clustering Approach to Self-supervised Learning for Class-imbalanced Image Data

arXiv:2212.11444v11.8h-index: 7

Originality Synthesis-oriented

AI Analysis

This addresses model bias in self-supervised learning for class-imbalanced datasets, but it is incremental as it builds on existing methods.

The paper tackled the problem of class-imbalance in self-supervised learning for image data, finding that offline clustering of features and knowledge distillation can improve performance, with experiments on CIFAR-10 showing gains over baseline models like SimCLR and SimSiam.

Class-imbalanced datasets are known to cause the problem of model being biased towards the majority classes. In this project, we set up two research questions: 1) when is the class-imbalance problem more prevalent in self-supervised pre-training? and 2) can offline clustering of feature representations help pre-training on class-imbalanced data? Our experiments investigate the former question by adjusting the degree of {\it class-imbalance} when training the baseline models, namely SimCLR and SimSiam on CIFAR-10 database. To answer the latter question, we train each expert model on each subset of the feature clusters. We then distill the knowledge of expert models into a single model, so that we will be able to compare the performance of this model to our baselines.

View on arXiv PDF

Similar