Simple but Effective Unsupervised Classification for Specified Domain Images: A Case Study on Fungi Images
This addresses the challenge of costly and inefficient manual annotation in specialized domains like fungi imaging, offering a practical solution for researchers to build datasets, though it is incremental as it builds on existing unsupervised and self-supervised methods.
The paper tackles the problem of efficiently obtaining high-quality labeled datasets for specialized domain images without annotated data by proposing an unsupervised classification method, achieving 94.1% and 96.7% accuracy on public and private fungal image datasets, respectively.
High-quality labeled datasets are essential for deep learning. Traditional manual annotation methods are not only costly and inefficient but also pose challenges in specialized domains where expert knowledge is needed. Self-supervised methods, despite leveraging unlabeled data for feature extraction, still require hundreds or thousands of labeled instances to guide the model for effective specialized image classification. Current unsupervised learning methods offer automatic classification without prior annotation but often compromise on accuracy. As a result, efficiently procuring high-quality labeled datasets remains a pressing challenge for specialized domain images devoid of annotated data. Addressing this, an unsupervised classification method with three key ideas is introduced: 1) dual-step feature dimensionality reduction using a pre-trained model and manifold learning, 2) a voting mechanism from multiple clustering algorithms, and 3) post-hoc instead of prior manual annotation. This approach outperforms supervised methods in classification accuracy, as demonstrated with fungal image data, achieving 94.1% and 96.7% on public and private datasets respectively. The proposed unsupervised classification method reduces dependency on pre-annotated datasets, enabling a closed-loop for data classification. The simplicity and ease of use of this method will also bring convenience to researchers in various fields in building datasets, promoting AI applications for images in specialized domains.