Weakly Supervised Clustering by Exploiting Unique Class Count
This addresses the problem of reducing annotation costs for clustering tasks in domains like medical imaging, though it is incremental as it builds on existing weakly supervised and multiple instance learning methods.
The paper tackles the problem of clustering without instance-level labels by introducing a weakly supervised framework based on unique class count (ucc) labels at the bag level, achieving clustering performance comparable to fully supervised models in experiments, including a real-world semantic segmentation task for breast cancer metastases.
A weakly supervised learning based clustering framework is proposed in this paper. As the core of this framework, we introduce a novel multiple instance learning task based on a bag level label called unique class count ($ucc$), which is the number of unique classes among all instances inside the bag. In this task, no annotations on individual instances inside the bag are needed during training of the models. We mathematically prove that with a perfect $ucc$ classifier, perfect clustering of individual instances inside the bags is possible even when no annotations on individual instances are given during training. We have constructed a neural network based $ucc$ classifier and experimentally shown that the clustering performance of our framework with our weakly supervised $ucc$ classifier is comparable to that of fully supervised learning models where labels for all instances are known. Furthermore, we have tested the applicability of our framework to a real world task of semantic segmentation of breast cancer metastases in histological lymph node sections and shown that the performance of our weakly supervised framework is comparable to the performance of a fully supervised Unet model.