LGCVDec 22, 2022

Offline Clustering Approach to Self-supervised Learning for Class-imbalanced Image Data

arXiv:2212.11444v1h-index: 7
Originality Synthesis-oriented
AI Analysis

This addresses model bias in self-supervised learning for class-imbalanced datasets, but it is incremental as it builds on existing methods.

The paper tackled the problem of class-imbalance in self-supervised learning for image data, finding that offline clustering of features and knowledge distillation can improve performance, with experiments on CIFAR-10 showing gains over baseline models like SimCLR and SimSiam.

Class-imbalanced datasets are known to cause the problem of model being biased towards the majority classes. In this project, we set up two research questions: 1) when is the class-imbalance problem more prevalent in self-supervised pre-training? and 2) can offline clustering of feature representations help pre-training on class-imbalanced data? Our experiments investigate the former question by adjusting the degree of {\it class-imbalance} when training the baseline models, namely SimCLR and SimSiam on CIFAR-10 database. To answer the latter question, we train each expert model on each subset of the feature clusters. We then distill the knowledge of expert models into a single model, so that we will be able to compare the performance of this model to our baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes