CV LGNov 27, 2024

Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled Data

Aoran Shen, Minghao Dai, Jiacheng Hu, Yingbin Liang, Shiru Wang, Junliang Du

arXiv:2411.18622v110.513 citationsh-index: 102024 4th International Conference on Electronic Information Engineering and Computer Communication (EIECC)

Originality Synthesis-oriented

AI Analysis

This work addresses data scarcity in image classification for practical applications, but it is incremental as it combines existing semi-supervised and CNN techniques.

The study tackled the problem of limited labeled data in image classification by optimizing data mining with semi-supervised learning, resulting in significant performance improvements over traditional methods like SVM, XGBoost, and MLP on the CIFAR-10 dataset, with enhanced accuracy, recall, and F1 score.

In the 21st-century information age, with the development of big data technology, effectively extracting valuable information from massive data has become a key issue. Traditional data mining methods are inadequate when faced with large-scale, high-dimensional and complex data. Especially when labeled data is scarce, their performance is greatly limited. This study optimizes data mining algorithms by introducing semi-supervised learning methods, aiming to improve the algorithm's ability to utilize unlabeled data, thereby achieving more accurate data analysis and pattern recognition under limited labeled data conditions. Specifically, we adopt a self-training method and combine it with a convolutional neural network (CNN) for image feature extraction and classification, and continuously improve the model prediction performance through an iterative process. The experimental results demonstrate that the proposed method significantly outperforms traditional machine learning techniques such as Support Vector Machine (SVM), XGBoost, and Multi-Layer Perceptron (MLP) on the CIFAR-10 image classification dataset. Notable improvements were observed in key performance metrics, including accuracy, recall, and F1 score. Furthermore, the robustness and noise-resistance capabilities of the semi-supervised CNN model were validated through experiments under varying noise levels, confirming its practical applicability in real-world scenarios.

View on arXiv PDF

Similar