LGFeb 6, 2023

Linking data separation, visual separation, and classifier performance using pseudo-labeling by contrastive learning

arXiv:2302.02663v15 citationsh-index: 50
Originality Incremental advance
AI Analysis

This work addresses the challenge of expensive supervision in medical and biological data, offering an incremental improvement to existing pseudo-labeling methods.

The paper tackled the problem of limited supervised data in deep neural network training, particularly for medical image classification, by proposing contrastive learning methods to improve latent space separation and demonstrating correlations between data separation, visual separation, and classifier performance, achieving results on human intestinal parasite datasets with only 1% supervised samples.

Lacking supervised data is an issue while training deep neural networks (DNNs), mainly when considering medical and biological data where supervision is expensive. Recently, Embedded Pseudo-Labeling (EPL) addressed this problem by using a non-linear projection (t-SNE) from a feature space of the DNN to a 2D space, followed by semi-supervised label propagation using a connectivity-based method (OPFSemi). We argue that the performance of the final classifier depends on the data separation present in the latent space and visual separation present in the projection. We address this by first proposing to use contrastive learning to produce the latent space for EPL by two methods (SimCLR and SupCon) and by their combination, and secondly by showing, via an extensive set of experiments, the aforementioned correlations between data separation, visual separation, and classifier performance. We demonstrate our results by the classification of five real-world challenging image datasets of human intestinal parasites with only 1% supervised samples.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes