VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models
This addresses the challenge of discovering unknown classes from unlabelled data for computer vision applications, representing a novel approach with specific gains.
The paper tackles the problem of novel class discovery in images by fusing visual-textual semantics and prototype guided clustering, achieving up to 25.3% improvement in accuracy for unknown classes on CIFAR-100.
Novel Class Discovery aims to utilise prior knowledge of known classes to classify and discover unknown classes from unlabelled data. Existing NCD methods for images primarily rely on visual features, which suffer from limitations such as insufficient feature discriminability and the long-tail distribution of data. We propose LLM-NCD, a multimodal framework that breaks this bottleneck by fusing visual-textual semantics and prototype guided clustering. Our key innovation lies in modelling cluster centres and semantic prototypes of known classes by jointly optimising known class image and text features, and a dualphase discovery mechanism that dynamically separates known or novel samples via semantic affinity thresholds and adaptive clustering. Experiments on the CIFAR-100 dataset show that compared to the current methods, this method achieves up to 25.3% improvement in accuracy for unknown classes. Notably, our method shows unique resilience to long tail distributions, a first in NCD literature.